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Abstract 



\j Automated reasoning about uncertain knowledge has many applications. One diffi- 
culty when developing such systems is the lack of a completely satisfactory integration of 
logic and probability. We address this problem directly. Expressive languages like higher- 
order logic are ideally suited for representing and reasoning about structured knowledge. 
Uncertain knowledge can be modeled by using graded probabilities rather than binary 
truth- values. The main technical problem studied in this paper is the following: Given 
a set of sentences, each having some probability of being true, what probability should 
be ascribed to other (query) sentences? A natural wish-list, among others, is that the 
probability distribution (i) is consistent with the knowledge base, (ii) allows for a consis- 
tent inference procedure and in particular (iii) reduces to deductive logic in the limit of 
probabilities being and 1, (iv) allows (Bayesian) inductive reasoning and (v) learning 
in the limit and in particular (vi) allows confirmation of universally quantified hypothe- 
ses/sentences. We translate this wish-list into technical requirements for a prior proba- 
bility and show that probabilities satisfying all our criteria exist. We also give explicit 
constructions and several general characterizations of probabilities that satisfy some or 
all of the criteria and various (counter) examples. We also derive necessary and sufficient 
conditions for extending beliefs about finitely many sentences to suitable probabilities 
over all sentences, and in particular least dogmatic or least biased ones. We conclude 
with a brief outlook on how the developed theory might be used and approximated in 
autonomous reasoning agents. Our theory is a step towards a globally consistent and 
empirically satisfactory unification of probability and logic. 



1 Presented at the Fifth Workshop on Combining Probability and Logic (Progic 2011) in New York. 
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"The study of probability functions defined over the sentences of a rich enough formal 
language yields interesting insights in more than one direction. " 

— Haim Gaifman (1982) 



1 Introduction 



Motivation. Sophisticated computer applications generally require expressive languages for 
knowledge representation and reasoning. In particular, such languages need to be able to rep- 
resent both structured knowledge and uncertainty |Nil 86, HalQH |Mug96[ IDK03t IRD061 HajOl 



IWil02j . A suitable language for this purpose is higher-order logic |Chu40l [Hen50t lA"nd02t ILlo03[ 
vBD83, Lei94, ShaOl], which admits higher-order functions that can take functions as arguments 
and/or return functions as results. This facility is convenient for probabilistic modeling since it 
means that theories can contain probability densities |Far08l IPfe07l lGMR + 08] . In particular, 
many forms of probabilistic reasoning can be done in higher-order logic using the traditional 
axiomatic method: a theory can be written down which has the intended interpretation as a 
model and then conventional proof and computation techniques can be used to answer queries 
[NL09| [NLU 08J . While such a computational approach is effective, it is sometimes more natural 
to pose a problem as one where the probability of some sentences in the theory being true may 
be strictly less than one and/or the query sentence (and its negation) may not be a logical con- 
sequence of the theory. In such cases, deductive reasoning does not suffice for answering queries 
and it becomes necessary to use probabilistic methods |Par94l IKD071 IRD06t |Mug96[ IMR07] . 



Main aim. These considerations lead to the main technical issue studied in this paper: 

Given a set of sentences, each having some probability of being true, 
what probability should be ascribed to other (query) sentences? 

We build on the work of Gaifman |Gai64j whose paper with Snir [GS82] develops a quite 
comprehensive theory of probabilities on sentences in first-order Peano arithmetic. We take up 
these ideas, using non-dogmatic priors |GS82j and additionally the minimum relative entropy 



2 



principle as in |Wil08a] , but for general theories and in a higher-order setting. We concentrate on 
developing probabilities on sentences in a higher-order logic. This sets the stage for combining 
it with the probabilities inside sentences approach |NL09[ INLU08] . 

Summary of key concepts. Section [2] introduces higher-order logic and its relevant proper- 
ties. We use the higher-order logic (Definitions [I] [2j and [8]) based on Church's simple theory of 
types |Chu40t IHen50[ IAnd02] . We employ the Henkin semantics and make use of a particular 
class of interpretations, called separating interpretations (Definition [T2"]) . 

Section [3] gives the definition of probabilities on sentences in higher-order logic (Defini- 
tion [T7j), introduces the Gaifman condition, and develops some basic properties of such prob- 
abilities. Section H] then introduces probabilities on interpretations and shows their close con- 
nection with probabilities on sentences. Gaifman |Gai64] (generalized in Definition [20] and 
Propositions ETJ [22] [23]) introduced a condition, called Gaifman in |SK66] , that connects prob- 
abilities of quantified sentences to limits of probabilities of finite conjunctions. In our case, 
it effectively restricts probabilities to separating interpretations while maintaining countable 
additivity. 

While generally accepted in probability theory (Definition |2"8"]) . some circles argue that 
countable additivity (CA) does not have a good philosophical justification, and/or that it is 
not needed since real experience is always finite, hence only non-asymptotic statements are 
of practical relevance, for which CA is not needed. On the other hand, it is usually much 
easier to first obtain asymptotic statements which requires CA, and then improve upon them. 
Furthermore we will show that CA can guide us in the right direction to find good finitary prior 
probabilities. 

Another principle which has received much less attention than CA but is equally if not more 
important is that of Cournot [Cou43l ISha06j: An event of probability (close to) zero singled 
out in advance is physically impossible; or conversely, an event of probability 1 will physically 
happen for sure. In short: zero probability means impossibility. The history of the semantics 
of probability is stony |Fin73] . Cournot' s "forgotten" principle is one way of giving meaning 
to probabilistic statements like, "the relative frequency of heads of a fair coin converges to 1/2 
with probability 1" . The contraposition of Cournot is that one must assign non-zero probability 
to possible events. If "events" are described by sentences and "possible" means it is possible to 
satisfy these sentences, i.e. they possess a model, then we arrive at the strong Cournot principle 
that satisfiable sentences should be assigned non-zero probability (Definitions [251 and 135}) . This 
condition has been appropriately called 'non-dogmatic' in [GS82J. As long as something is not 
proven false, there is a (small) chance it is valid in the intended interpretation. This non- 
dogmatism is crucial in Bayesian inductive reasoning, since no evidence (however strong) can 
increase a zero prior belief to a non-zero posterior belief |RHllj . The Gaifman condition is 
inconsistent with the strong Cournot principle (Example |4*3|) . but consistent with a weaker 
version (Definition [26]) . Probabilities that are Gaifman and (plain, not strong) Cournot allow 
learning in the limit (Theorem [27] and Corollary [M]) . 

A standard way to construct (general / Cournot / Gaifman) probabilities on sentences is to 
construct (general / non-dogmatic / separating) probabilities on interpretations, and then trans- 
fer them to sentences (Propositions [25] [32] and I3"81) . At the same time we give model-theoretic 
characterizations of the Gaifman condition (Corollary [341) and the Cournot condition (Defini- 
tion |3T}) . In Section[5l we give a particularly simple construction of a probability that is Cournot 
and Gaifman (Theorem |4"0]) and a complete characterization of general/Cournot /Gaifman prob- 
abilities (Theorems loTH and [5^1 and Corollary [5^1) . We also give various examples of (strong) 
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(non)Cournot and/or Gaifman probabilities and (non) separating interpretations for countable 
domains (Examples H6] H7J and 148]) and finite domains (Examples H2, H31 HH |4"5]) . 

Section [7] considers the important practical situation of whether a real- valued function on 
a set of sentences can be extended to a probability on all sentences; a necessary and sufficient 
condition is given for this, as is a method for determining such probabilities using minimum 
relative entropy introduced in Section [6j Prior knowledge and data constrain our (belief) prob- 
abilities in various ways, which we need to take into account when constructing probabilities. 
Prior knowledge is usually given in the form of probabilities on sentences like "the coin has head 
probability 1/2", or facts like "all electrons have the same charge", or non- logical axioms like 
"there are infinitely many natural numbers" . They correspond to requiring their probability to 
be 1/2, extremely close to 1, and 1, respectively. It is therefore necessary to be able to go from 
probabilities on sentences to probability on interpretations (Proposition [31]). This allows us to 
prove various necessary and sufficient conditions under which such partial probability specifica- 
tions can be completed and what properties they have (Propositions I5T1 and loT)]) . In particular 
we show that hierarchical probabilistic knowledge (Definitions loT]) is always probabilistically 
consistent (Proposition 163]) . Further, seldom does knowledge constrain the probability on all 
sentences to be uniquely determined. In this case it is natural to choose a probability that is 
least dogmatic or biased |Nil86] IWil08a] . The minimum relative entropy (Definition 154]) prin- 
ciple can be used to construct such a unique minimally more informative probability that is 
consistent with our prior knowledge (Definition [531 and Propositions 151)1 and I5T]) . 

Section [8] is a brief outlook on how the developed theory might be used and approximated in 
autonomous reasoning agents. In particular, certain knowledge, learning in the limit f ]6~4"]) . the 
infamous black raven paradox, and the Monty Hall problem are discussed, but only briefly. The 
paper ends with a more detailed discussion in Section |9]of the broader context and motivation of 
this work, as well as related results in the literature, the outline of a framework for probabilistic 
reasoning and modeling in higher-order logic, and future research directions. 

While some of the results presented in this paper are known in the first-order case and their 
extension to the higher-order case is straightforward, it nevertheless seems useful to provide 
a survey of this material (with proofs included). Also, many beautiful ideas in the long and 
technical paper by Gaifman [GS82] deserve wider attention than they have received. We hope 
our exposition helps to rectify this situation. 

2 Logic 

We review here a standard formulation of higher-order logic |And02] that is based on Church's 
simple theory of types |Chu40] . Other references on higher-order logic include [Llo03| IFar08] 
lvBD83] ILei94] IShaOl] . Some discussion of the interesting history of the simple theory of types 
is given in |And02] IFar08] . 

The best way to think about higher-order logic is that it is the formalization of everyday 
informal mathematics: whatever mathematical description one might give of some situation, the 
formalization of that situation in higher-order logic is likely to be a straightforward translation 
of the informal description. In particular, higher-order logic provides a suitable foundation 
for mathematics itself which has several advantages over more traditional approaches that are 
based on axiomatizing sets in first-order logic. Furthermore, higher-order logic is the logical 
formalism of choice for much of theoretical computer science and also applications areas such as 
software and hardware verification. For a convincing account of the advantages of higher-order 
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over first-order logic in computer science, see [Far08] . 

The logic presented here differs in a minor way from that in |And02] in that we omit the 
description operator l, for reasons that are discussed later. All the results from |And02] that 
are used here also hold for the logic with i omitted, by obvious changes to their proofs. In 
addition the notation for the logic used here differs somewhat from that in [And02] . but the 
correspondences will always be clear. There are also a few differences in terminology here 
compared to [And02j that are noted along the way. 

We begin with the definition of a type. 

Definition 1 (type ex.) A type is defined inductively as follows. 

1. o is a type. 

2. i is a type. 

3. If a and (3 are types, then a — >■ f3 is a type. 

In this definition, o is the type of the truth values, i is the type of individuals, and a — > (3 
is the type of functions from elements of type a to elements of type (3. We use the convention 
that — > is right associative. So, for example, when we write a — > f3 — ?■ 7 — > k we mean 
a —7- (f3 —7- (7 k)). A function type is a type of the form a — > /3, for some a and (3. 

There is a denumerable list of variables of each type. The logical constants are = Q ,_ m _ i . cl , for 
each type a. The denotation of equality = a ^ a ^o is the identity relation between individuals of 
type a. In addition, there may be other non-logical constants of various types. The alphabet is 
the set of all variables and constants. 

Next comes the definition of a term. 

Definition 2 (term t) A term, together with its type, is defined inductively as follows. 

1. A variable of type a is a term of type a. 

2. A constant of type a is a term of type a . 

3. Iftp is a term of type (3 and x a a variable of type a, then \x a .tp is a term of type a — > /3. 

4- If s a ->j3 is a term of type a — > /3 and t a a term of type a, then {s a ^p t a ) is a term of type 
(3. 

A formula is a term of type o. A closed term is a term with no free variables. A sentence is a 
closed formula. A theory is a set of formulas. 

If the set of non-logical constants is countable, then the set of terms is denumerable. As 
shown in |And02t p. 212], using equality, it is easy to define T D (truth), _L (falsity), A ^ ^ 
(conjunction), V _^ ^ (disjunction), -i ^ (negation), Wx a .t a (universal quantification), and 
3x a .t (existential quantification). The axioms for the logic are as follows |And02t p. 213]: 

Axiom 3 (logical axioms) 

1. Truth values: (g ^ T ) A (g ^ _L ) = Vx .(# 

2. Leibniz' law: (x a = y a ) ((/i a _> x a ) = (h a ^ Q y a )) 

3. Extensionality: (f a -+p = g a ->p) = Vx Q .((/ Q ^ /3 x a ) = {g a ^p x a )) 

4- (3-reduction: (Ax^.t^ s a ) = t^x^/s^} (provided that s a is free /orx a in tp) 
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In the above, g ^ , . . . are variables of the indicated type, x a is a syntactical variable for 
variables of type a, and t^, . . .are syntactical variables for terms of the indicated type. Also 
tp{x a /s a } is the result of simultaneously substituting s a for all free occurrences of x a in tp. 

Axiom (1) expresses the idea the truth and falsity are the only truth values; Axioms (2) 
(for each type a) express a basic property of equality; Axioms (3) (for each type a — > j3) are 
the axioms of extensionality; and Axiom schemata (4) is the axiom for /3-reduction. 

Here is the single rule of inference |And02l p. 2 13]: 

Rule 4 (rule of inference; equality substitution) From t a and s a = r a , infer the result 
of replacing one occurrence of s a in t Q by an occurrence ofr a , provided that the occurrence of 
s a in t G is not (an occurrence of a variable) immediately preceded by a X. 

The logic also has an equational reasoning system that has been used as the computational 
basis for a functional logic programming language |Llo03t INL09[ INLU081 ILN11] . 

In the following, to simplify the notation, we usually omit the type subscripts on terms; 
the type of a term will always either be unimportant or clear from the context. We use 
if, x, ip f° r sentences and sometimes for formulas, and t, r, s for terms. With this notation, 
Wx.ip = [Xx.ip = Xx.T] and 3x.ip = [Xx.cp ^ Ax.JL]. 

The logic includes Church's A-calculus: a term of the form Xx.t is an abstraction and a term 
of the form (s t) is an application. 

The logic is given a conventional Henkin semantics |Hen50j . 

Definition 5 (frame {T> a } a ) A frame is a collection {T> a } a of non-empty sets, one for each 
type a, satisfying the following conditions. 

1. V = {T,F}. 

2. is some collection of functions from T>p to T> 1 . 
For each type a, T> a is a called a domain. 

The members of T> are called the truth values and the members of T> 1 are called individuals. 

Definition 6 (valuation V) Given a frame {T> a } a , a valuation V is a function that maps 
each constant having type a to an element ofD a such that ^(=0,^0,^0) is the function from T> a 
into V a ^ defined by 

I h otherwise, 

for x,y e "Da- 
Definition 7 (variable assignment v) A variable assignment v with respect to a frame 
{D a } a is a function that maps each variable of type a to an element ofT> a . 

An interpretation can now be defined. 

Definition 8 (interpretation {{T> a y a , V)) A pair I = ({T> a } a ,V) is an interpretation if 

there is a function V such that, for each variable assignment v and for each term t of type a, 
V(t,I, v) G D a and the following conditions are satisfied. 

1. V(x,I,v) = v(x), where x is a variable. 
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2. V(C,I,v) = V(C), where C is a constant. 

3. V(Xx.s, I, v) = the function whose value for each d 6 Vp is V(s,I,V) ! where Xx.s has 
type — y 7 and v' is v except v'(x) = d. 

I V((rs),/,v)=V(r,/,v)(V(s,/,v)). 

If ({D a } a , V) is an interpretation, then the function V is uniquely defined. V(t, I, v) is called 
the denotation of t with respect to / and v. If t is a closed term, then V(t, /, v) is independent 
of v and we write it as V(t, I). Not every pair {{T> a } a , V) is an interpretation; to be an 
interpretation, every term must have a denotation with respect to each variable assignment. 

What is called an interpretation here is called a general model in |And02] , following Henkin. 
In [And02j . a general model is called a standard model if, for each a and 0, T> a ^p is the set of 
all functions from T> a to Dp. Moving from standard models to general models was the crucial 
step that allowed Henkin to prove the completeness of the logic [Hcn50j. 

Definition 9 (satisfiable) Let t be a formula, I = ({D a } a ,V) an interpretation, and v a 
variable assignment with respect to {T> a } a . 

1. v satisfies t in I ifV(t, J,v) = T. 

2. t is satisfiable in I if there is a variable assignment which satisfies t in I. 

3. t is valid in / if every variable assignment satisfies t in I. 
4- t is valid if t is valid in every interpretation. 

5. A model for a theory is an interpretation in which each formula in the theory is valid. 

Definition 10 (consistency) A theory is consistent if _L cannot be derived from the theory. 

Definition 11 (logical consequence) A formula t is a logical consequence of a theory if t 
is valid in every model of the theory. 

We will have need for a particular class of interpretations, defined as follows. 

Definition 12 (separating interpretation/model) An interpretation I for an alphabet is 
separating if, for every pair r , s of closed terms of the same function type, say, a — > 0, such 
that V(r, /) 7^ V(s, /), there exists a closed term t of type a such that V((r t), I) ^ V((s t), I). 
A separating model is a separating interpretation that is a model (for some set of formulas). 

We emphasize that, in the definition of a separating interpretation, the closed term t is 
formed only from symbols in the given alphabet. Intuitively, an interpretation is separating if, 
for every pair r, s of closed terms of the same type a — > 0, whose respective denotations in 
the interpretation are different, there exists a closed term t of type a for which the respec- 
tive denotations in the interpretation of (r t) and (s t) are different. Thus, in a separating 
interpretation, closed terms that have distinct functions as denotations must be distinct on an 
argument in the domain that is the denotation of some closed term using the given alphabet 
and thus is 'accessible' or 'nameable' via that term. 

The concept of a separating interpretation is closely related to the concept of an extension- 
ally complete theory that plays a crucial part in the proof of completeness |And02t p. 248]. 

Definition 13 (extensionally complete) A set S of sentences is extensionally complete if, 
for every pair r, s of closed terms of the same function type, say, a — > 0, there exists a closed 
term t of type a such that r ^ s — >■ (r t) ^ (s t) is derivable from S. 
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A connection with separating interpretations is provided by the following result. 



Proposition 14 (extensionally complete =>- separating) Every model of an extension- 
ally complete set of sentences is separating. 



Proof. Let S be a set of sentences that is extensionally complete and J be a model for 
S. Suppose that r, s is a pair of closed terms of the same function type, say, a — > /3, such 
that V(r, J) ^ V(s,I). By extensional completeness, there exists a closed term t such that 
r / s -) (r f) / t) is derivable from S. Since / is a model for S and the proof system is 
sound, it follows that V((r £), I) 7^ V((s t), /). Hence / is separating. ■ 



Now we show that, if we are willing to expand the alphabet, any set of sentences having a 
model also has a separating model in an expanded alphabet. 

Proposition 15 (existence of separating models) // a set S of sentences has a model, 
then there exists an alphabet that includes the original alphabet and an interpretation based on 
the expanded alphabet which is a separating model for S . 



Proof. Since S has a model, S is consistent. By |And02t Theorem 5500], there is an expansion 
of the original alphabet and a set T of sentences such that S C T, T is consistent, and T is 
extensionally complete in the expanded alphabet. Since T is consistent, by Henkin's Theorem 
[An d02l Theorem 5501], it has a model (based on the expanded alphabet). By Proposition [TU 
this model must be a separating one, and it is also a model for S. ■ 



The most important property of the logic that we will need is compactness |And02[ Theorem 
5503]. 

Theorem 16 (compactness) // every finite subset of a set S of sentences has a model, then 
S has a model. 



In fact, most of the development in the paper can be carried out in any logic that has the 
compactness property. 

While the version of higher-order logic introduced in this section generally provides much 
more direct and succinct formalisations than first-order logic, for practical applications a num- 
ber of extensions are highly desirable. Some of these extensions are nothing more than abbre- 
viations, such as those used to introduce the connectives and quantifiers, and some are deeper. 
These extensions include many-sortedness, which allows more than one domain of individuals; 
tuples and product types; and type constructors and polymorphism. The logic of [Llo03j . which 
is also used in [NL09t INLU08j . includes all these extensions. These and other extensions are 
discussed in [Far08] . 



3 Probabilities on Sentences 

We now define probabilities on sentences. They are not probabilities in the conventional sense 
of probability theory (on a- algebras); however, a connection between probabilities on sentences 
and (conventional) probabilities on a cr-algebra on the set of interpretations will be made below. 
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Definition 17 (probability on sentences) Let S be the set of all sentences (for some al- 
phabet). A probability (on sentences) is a non-negative function \i : S — > H. satisfying the 
following conditions: 

1. If ' tp is valid, then /i(tp) = 1. 

2. If ->((p A ip) is valid, then /i(tp V ip) — /i(tp) + (i{ip)- 

For a sentence ip, where p{ip) > 0, one can define the conditional probability p(-\ip) by 

My? A 



for each sentence tp. 

A probability /i : 5 -)■ R on sets of sentences has the following intended meaning: 

For a sentence tp, p(tp) is the degree of belief that tp is true. 

Definition 18 (pairwise disjoint sentences) The sentences tpi,...,tp n are pairwise disjoint 
if, for each i,j = 1, ...,n such that i ^ j, ->{tpi A ipj) is valid. 

Proposition 19 (properties of probability on sentences) Let p, : S — >■ M. be a probability 
on sentences. Then the following hold: 



1 



4- 

5. 

6. 



p,(-xp) = 1 — p{tp), for each <p G S. 



2. p(tp) < 1, for each tp e S. 

3. If tp is unsatisfiable, then fi(tp) = 0. 
If ' tp — >■ ip is valid, then p(tp) < p(ip)- 
If tp = ip is valid, then p(tp) = p{ip)- 
U {fi}?=i i s a finite subset of pairwise disjoint 
sentences in S, then /i(V™=i V 9 *) = Yli=i Mv 9 *)- 

7. If{(Pi}i=i is a finite subset of S , then p(\J n i=l tpi) < J27=i Kfi) ■ 

8. The following are equivalent: 

(a) For each tp e S, p(tp) = 1 implies tp is valid. 

(b) For each tp e S, /i(tp) = implies tp is unsatisfiable. 

9. If p,(ip) > 0, then p(-\tp) is a probability. 
10. p(p V ip) + p(p A ip) = p(tp) + p(ip). 



Proof. The proof is elementary and standard, and only included for completeness. 

1. Since ->{tp A -up) is valid, p{tp V -up) = p(p) + fj,(-xp). Also, since tp V -xp is valid, 
p(tp V -up) = 1. Thus fJ>(-xp) = 1 — p(p)- 

2. Since 1 — p(tp) = fJ-(-><p) > 0, we have that //(<£>) < 1- 

3. Note that tp is unsatisfiable iff -k/? is valid. Thus p{~<p) = 1 — p(p) = 1, so that p(p) = 0. 

4. Note first that tp — y ip is valid iff ->(tp A -up) is valid. Thus \x{tp V -i^>) = 
/i(y?) + //(->V) — M^) + 1 - MVO- Hence //(</?) = p(ip) + //(<£> V ->ip) — 1 < 

5. This follows immediately from Part 4. 
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6. The proof is by induction on n. When n — 1 the result is obvious. Assume now the 
result is true for n — 1. Note that /\™ =2 ^(y^i A (pi) is valid and so ->{<pi A Vi™=2 V») i s va lid- Then 

m vr=i 

= v vr= 2 w) 
= + Mvr= 2 
= m^o + Er=2M(^) 

= E?=iM(¥'i)- 

7. The proof is by induction on n. When n = 1 the result is obvious. Assume now the 
result is true for n — 1. Then 

Mvr=i v<) 

= m(^i a -> vr= 2 V*) v vr= 2 
= a -. v™=2 y*) + /^(vr=2 

< /i(<^i) + E^Mv 9 *) [Part 4 and induction hypothesis] 

= Er=iM<^)- 

8. Suppose that, for each ip e S, (J,((p) = 1 implies <p is valid. Now let ip G 5 satisfy 
/i(V0 = 0. By Part 1, = 1. Thus -t0 is valid and so ip is unsatisfiable. 

Conversely, suppose that, for each ip e <S, //(</?) = implies y? is unsatisfiable. Now let 
tp € S satisfy fi(ip) = 1. By Part 1, = 0. Thus -t0 is unsatisfiable and so ^ is valid. 

9. Suppose that p is valid. Then [i((p\ip) = = j-j^y = 1. 
Suppose that n(^Ax) is valid. Then 

= //((^ v x ) a v) / MV>) 

= [fj,(<p Aip) + n{x A ip)) I n(ip) [-n(((p A if>) A (x A ip)) is valid] 

= + Kx\^)- 

Thus is a probability. 

10. Let x '■= ""V 9 A ^. Then 

= Kv> v x) + K<p a VO 
= My 9 ) + Mx) + a 

= /i(^)+MxV(^AV)) 

= My?) + 



h(^i A VlU^i) is valid] 
[induction hypothesis] 



[elementary logic] 
[-1(93 A x) is valid and Def. [132] 
h(x A A ^)) is valid and Def. QH2] 
[elementary logic] 



Next we introduce Gaifman probabilities. 
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Definition 20 (Gaifman probability) Let fi : S ^ R be a probability on sentences. Then fi 
is Gaifman if 

n 

»(r = s)= inf 'd/\((r t t ) = (st t ))), 

\t\,...,t n y . 

for every pair r and s of closed terms having the same function type, say, a — > (3, and where 
{ti, t n } ranges over all finite sets of closed terms of type a. 

Proposition 21 (Gaifman probability) Let /i : S — >■ R be a probability on sentences. Then 
the following are equivalent. 

1. li is Gaifman. 

n 

2. fi(r^s)= sup fi(\/((rt t )^(st t ))), 

{tl,.:,t„} ^ 

for every pair r and s of closed terms having the same function type, say, a — > j3, and 
where {ti,...,t n } ranges over all finite sets of closed terms of type a. 

n 

3. fx(3x.(p) = sup fi(\J ip{x/U}), 

{ti,...,t n } i=\ 

for every formula ip having a single free variable x of type a, say, and where {t±, ...,t n } 
ranges over all finite sets of closed terms of type a. 

n 

4. n{\/x.p) = inf u(/\ip{x/ti}), 

{ti,...,t n } * 

for every formula ip having a single free variable x of type a, say, and where {t±, ...,t n } 
ranges over all finite sets of closed terms of type a. 

Proof. I. implies 2. Suppose that the probability pL is Gaifman. Then 

H(r ^ s) 
= 1 — /i(r = s) 

= 1 - inf{ tl ,...,t„} V(f\i=i(( r U) = ( s *<))) 
= 1-M {tl _ tn} ^yt 1 ((r U)^(sU))) 

= l-inf {4l ,... iM (l-/i(Vr =1 ((r U)^(sU))) 

= sup {tli ... itn} //(vr=i(( r u) ^ (sti))). 

Hence 2. holds. 

2. implies 3. Suppose that 2. holds. Then 

= /i(\x.ip 7^ Xx.F) 

= sup {tl> ... >tn} ii(\l n i=1 ((\x.ip U) j£ (Xx.Fti))) 
= sup {tli ... A} /i(\/r=i^{^Ai})- 
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Hence 3. holds. 

3. implies 4. Suppose that 3. holds. Then 



//(Var.p) 
= /i(-tEb. -><£>) 
= 1 — fi(3x.-icp) 

= i - sup { i lr .. jM /i(vr=i -"p{x/u}) 
= i - sup {tli ... A} n(-> Ar=i v{^/*<}) 

= i - sup {il iM (i - n(N=i v{ x /U})) 
= inf{*i,... I * B }^(Ar=i¥ , { a: /*i})- 

Hence 4. holds. 

4. implies 1. Suppose that 4. holds. Then 

/x(r = s) 

= fi(yx.((r x) = (s x))) [Axioms of Extensionality] 

= inf{ tl ,...,t„}MAr=i(( r x ) = ( s x)){x/U}) 
= inf {ti,...,t„} ^(Ar=i(( r U) = **)))• 

Hence 1. holds. ■ 



Proposition 22 (limits for countable alphabet) Let the alphabet be countable, \i : S — Y R 
a probability on sentences, and ip a formula having a single free variable x of type a. 

n n 

1. sup <p{x/ti}) = lim <p{x/ti}) 
{ti,...,M i=1 i=1 

?i n 

2. inf u(/\<P{*/U}) = lim /i(A ^{x/tj), 

where, on the LHS, {t\, ...,t n } ranges over all finite sets of closed terms of type a and, on the 
RHS, ti,t 2 ,... is an enumeration of all closed terms of type a. 



Proof. Since the alphabet is countable, the set of all closed terms of type a is countable and 
hence can be enumerated. 

1. Let {t[, ...,t' m } be a subset of closed terms of type a. Let n be sufficiently large so 
that each for j = l,...,m, appears in the enumeration ti,...,t n of the first n terms of an 
enumeration of all closed terms of type a. 

Then Vj=i V 9 !^/^} ~~ V£=i <f{ x /ti} is valid, so that 

mv7=i^a;-}) < ¥>{*/**}), 

by Proposition [19j4. By first taking the supremum on the RHS and then the supremum on the 
LHS we get 

sup { i i ,...,i U MV7=M x / t i}) ^ suPnMVLi^W^})- 
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Conversely we have 

sup { t' li ... iC} MV^i^{zAi}) > fi(y^ =1 (f{x/ti}). 

since the sup on the LHS includes {ti, t n }. Now taking the limit n — > oo and combining both 
inequalities gives equality. Proposition [T9l4 gives that V^) > /i(v?); hence /i(VILi V 9 ! 1 /^}) 
is monotone non- decreasing in n, which allows the replacement of sup n by lim^oo. 

2. The proof is similar. ■ 

We can reduce the class of terms that is necessary to "browse" through even further, by 
considering only one term from each equivalence class, where two terms t and if are equivalent 
iff t — t' is valid. 

Proposition 23 (Gaifman for countable alphabet) Let the alphabet be countable and fi : 
S — y 1R a probability on sentences. Then the following are equivalent. 

1. is Gaifman. 

n 

2. fi(r = s)= lim A«(/\((r U) = (s *<))), 

i=l 

for every pair r and s of closed terms having the same function type, say, a — >■ (3, and 
where ti, t 2 , . . . is an enumeration of all closed terms of type a. 

n 

3. fi(r^s)= lim fi(\J((rU)y£(sU))), 

8=1 

for every pair r and s of closed terms having the same function type, say, a — > /3, and 
where ti, t 2 , . . . is an enumeration of all closed terms of type a. 

n 

4. /i(Bx.(p) = lim n(\J Lp{x/ti}), 

n— >oo * 
i=l 

for every formula tp having a single free variable x of type a, say, and where t\, ti, ■ ■ ■ is 
an enumeration of all closed terms of type a. 

n 

5. (i(Vx.(p) = lim fi{/\(p{x/ti}), 

n— >oo ' ' 

i=l 

for every formula ip having a single free variable x of type a, say, and where t 1 , i 2 , ■ • ■ is 
an enumeration of all closed terms of type a. 

In each case, the enumeration ti, t 2 , ■ ■ ■ of closed terms of type a can be reduced to one where a 
single representative is chosen from each equivalence class under the equivalence relation t and 
t' are equivalent if t = t' is valid. 

Proof. Two terms t and t! are said to be equivalent iff t — t' is valid, which implies 
(p{x/t} = ip{x/t'} is valid. This allows us to relax in the proof of Proposition [221 'appears' by 
'is equivalent to some term in' and 'includes' by 'includes a term equivalent to some term in'. 
Finally combine this with Proposition |2~T1 and Definition [201 ■ 
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While these forms of the Gaifman condition closely resemble the continuity condition (count- 
able additivity (CA) axiom) in measure theory, we will see that CA over (general) interpre- 
tations is derived from the compactness theorem and not from the Gaifman condition (see 
Definition [28] and Proposition [30] in the next section). But the Gaifman condition confines 
probabilities to separating interpretations while preserving CA (Propositions 1291 and I3~T1) . 

Example 24 (natural numbers Nat) Consider the standard type Nat of natural numbers, 
as the type of individuals, and the usual Peano axioms. Let be the constant of type Nat whose 
denotation is the natural number 0, and n = S n (0) = (S (S (S • ■ ■ (S Q)))) be the term of type 
Nat whose denotation is the natural number n, where S is a constant of type Nat — > Nat whose 
denotation is the successor function. In practice one usually defines denumerably many con- 
stants 1, 2, 3, one for each natural number, directly. Further, let +, x : Nat — > Nat — > Nat be 
functions with their usual axioms and meaning. Now there are many closed terms that represent 
the same natural number. For instance 8, (\x.x8), (3+5), (2x4) are different terms, all having 
the number 8 as denotation. For type Nat, it is sufficient to choose t n = n in Proposition[2c\4- 
and so the condition in Definition\2(h (indeed) reduces to the one used by Gaifman fUS82j. ^> 

Of particular interest are probabilities that are strictly positive on satisfiable sentences since 
this is a desirable property of a prior. This suggests the following definition. 

Definition 25 (strongly Cournot probability) A probability \l : S — > R is strongly 
Cournot if, for each ip G S , ip is satisfiable implies fi(ip) > 0. 

By Part 8 of Proposition [191 a probability is strongly Cournot iff, for each tp G S, ip is 
not valid implies fi(<p) < 1, or, by contraposition, /i(y?) = 1 implies ip is valid. This is akin 
to Cournot 's principle as discussed in the introduction that an event of probability 1 singled 
out in advance will happen for sure in the real world. We will see this general idea plays an 
important role for inductive inference. 

However, the following weaker form of the Cournot principle will turn out to be more useful. 

Definition 26 (Cournot probability) A probability [i : S — » R is Cournot if, for each ip G 
S, ip has a separating model implies fi(ip) > 0. 

Clearly a strongly Cournot probability is Cournot. It will be the Cournot probabilities (not 
the strongly Cournot ones) that will be of most interest in the subsequent development. The 
major reasons for this are as follows. First, Theorem HQ] below shows that, if the alphabet 
is countable, there exists a probability on sentences that is Cournot and Gaifman. Such a 
probability makes a good prior. Second, the Cournot and Gaifman conditions are necessary 
and sufficient to do learning in the limit of universal hypotheses as the following theorem shows 
and as discussed in more detail in Section [8] 

Theorem 27 (confirming universal hypotheses) Let the alphabet be countable, // : «S — > R 
a probability on sentences, ip a formula having a single free variable x of some type a, t\, t 2) ... 
an enumeration of (representatives of) all closed terms of type a. Then 

n n 

n(Vx.ip | y\ ip{x/U}) n -^¥ 1 ip{x/U}) fi(yx.<p) > 

i=l i=l 
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If the left hand side (hence also the r.h.s.) holds, we say that fi can confirm universal hypothesis 
Vx.(p. It also holds that 



u can confirm all universal hypotheses 

is Gaifman and Cournot 

that have a separating model 

Proof, (top <=\. ... . .„ r /, in 

lim^oo /i(Vx.y? | /\. =1 Lp{x/U\) 

fJ.(Vx.ip) 

= S lAL^A,})'-^)] 

= 1 Wx.tp) > 0] 

(top As can be seen from the 4= proof, if one or both of the conditions fail, then 
fi(Vx.(p | Ar=i ^{^Ai}) does not converge to 1. 
For the bottoms we abbreviate the statements 



L{ip) 
G(<p) 
S{<p) 

Mfp) 



W x -v> I AILi <p{x/u}) ™ i] 

\\/x.<p has a separating model] 
[/i(Vx^) > 0] 



In this notation, the top44> reads L((p) iff G(ip) and A(ip). 

(bottom ■<=) Assume fi is Gaifman and Cournot and S((p). This implies G((p) and A(ip). 
By top <= we get L(<p). We have shown that for any (p, if fi is Gaifman and Cournot, then S(<p) 
implies L((p). 

(bottom Case 1 [S(<p) is true] Then by assumption, L(cp). Then by top =>■ we get G(<^) 
and A(<p). Note that every sentence tp can be written as ip = Vx.y? with 9? := [ip A (x = x)] 
being a formula having a single free variable x. Therefore, fJ,(ip) = fi(Vx.<p) > for all ip that 
have a separating model. Hence fi is Cournot. 

Case 2 is false] That is, Vx.y? has no separating model, therefore -iix.ip must have (at 

least one) separating model, say /. Since I is a separating model of 3x. -><£>, Definition [121 implies 
that there exists a closed term t such that / is also a separating model of x '■= ~"f{ x /t}- Now 

fi(yx.<p) + fi(x) 

= fi(Wx.ip V x) [Vx.y? and x are disjoint] 

= fi(Wx.(f V x)) [x is not free in x] 

= lim n /i(A" =1 (v ? V x)W*i}) [since %Vx), Case 1 implies G((p V x)] 

= lim n /i(A"=i ^W**} v X)) [ x is not free in X] 

= lim n/ u(A" =1 ^{^Ai}) + A*(x) [t = for some i, and <y2{x/t} A x false] 

This proves G((p) for S(f) false. 

Case 1 and 2 together prove G(<^) for all p, hence fi is Gaifman. ■ 
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4 Probabilities on Interpretations 

We now study probabilities defined on sets of interpretations. 

Consider the set X of interpretations (for the alphabet). A Borel cr-algebra can be defined 
on X. For that, a topology needs to be defined first. Given some alphabet, let S denote the set 
of sentences based on the alphabet. For each sentence <p, let mod(<p>) denote the set 

{/ G X | if is valid in /}. 

Consider the set B$ = {mod{f) \ <p G S}. Since Bs is closed under finite intersections, it is a 
basis for a topology T on X. Bs is also an algebra, since it is closed under complementation 
and finite unions, and X G Bs- Let B be the Borel cr-algebra formed from the topology T on 
X. In the following, probabilities on B will be considered. 

Suppose that the alphabet is countable (equivalently, the set of constants is countable). 
Then the set of terms and, in particular, the set S is countable. In this case, Bs is countable 
and hence the cr-algebra generated by Bs is the same as the Borel cr-algebra B generated by T . 

Definition 28 (probability on interpretations) A junction fi* : B — >■ R is a finitely ad- 
ditive probability on algebra B if /x*(0) = and /i*(X) = 1 and /i*(A n C) + n*{A U C) = 
/i*(v4) + yU*(C) for all A,C G B. It is called a Countably Additive (CA) probability or simply 
a probability if additionally for all countable collections {Ai} ie j C B of pairwise disjoint sets 
with [j ieI Ai eB it holds that fi*([j ieI A) = £ ie i^*(A)- 

For CA-probabilities, B is usually assumed to be a Borel cr-algebra, i.e. [J ieI Ai G B always 
holds. Countable additivity is equivalent to finite additivity and continuity: 

lim A**(nr=i A i) = /^(lim,woo iXi A i) for a11 A i ^ B - 

n— >oo 

First we show that a probability on the algebra gives a probability on sentences. 

Proposition 29 (/x* =^ /j,) Let S be the set of sentences, X the set of interpretations, Bs = 
{mod(ip) | if G S} the algebra on X, and //* : Bs — > M a finitely additive probability on Bs- 
Define /i : S — > M by 

M^) = fi*(mod(<p)), 
for each if G S. Then \x is a probability on S. 

Proof. The two conditions of Definition [T71 have to be established. Note that \x is non- negative 
because \i* is. 

Suppose that ip is valid. Then mod(<p) = X, so that f-i(<p) = n*(mod((p)) = /i*(X) = 1. 
Suppose that ->{ip A ip) is valid. Hence mod(<p) fl mod(ip) = 0. Thus 

= fi*(mod((f> V ip)) 

= fi*(mod(ip) U mod(ip)) 

= // (mod(<p)) + fj,* (mod(ip)) [/i* is finitely additive] 
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Hence fi is a probability. 



Note that only the finite additivity of fi* is needed in Proposition I2"9l 

Next we show that a probability on sentences gives a probability on interpretations. For 
this, a useful property of probabilities on Bs is needed. 

Proposition 30 (finite O countable additivity) Let S be the set of sentences, X the set 
of interpretations, and Bs = {mod(ip) \ ip £ S} the algebra on I. Then every finitely additive 
probability on Bs is countably additive on Bs- 

Proof. Let //* be a finitely additive probability on Bs- Suppose that {ip n }^ =l is a sequence 
of sentences such that mod(ip n ) D mod((p n+ i), for n = 1,2,... , and f]^Li m °d((Pn) = 0- 
Clearly (p n +i — > ip n is valid, for n = 1,2,.... Next we claim that ip no is unsatisfiable, for 

some no- To prove this, suppose on the contrary that ip n is satisfiable, for n = 1, 2, Since 

(p n +i — > (p n is valid, for n — 1, 2, . . ., it follows that {(pi, . . . , ip n } is satisfiable, for n — 1, 2, 

By the compactness theorem, {<^ n }£!Li is satisfiable, which contradicts the assumption that 
fl^Li m °d((p n ) = 0- Thus the claim that <p no is unsatisfiable, for some n , is proved. Since 
the mod(ip n ) are decreasing, we have that mod(ip n ) = 0, for n > uq. It thus follows that 
lim^oo n*(mod(tp n )) = /i*(0) = 0. Hence, by [Dud02l Theorem 3.1.1], /i* is countably additive 
on B s - ■ 



Proposition 31 (/j, =$* /J,*) Let the alphabet be countable, S the set of sentences, I the set of 
interpretations, and B the Borel a-algebra on X. Let \i : S — >■ R be a probability on sentences. 
Then there exists a unique probability /j,* : B — > K such that 

V*(mod(<p)) = fi((p), 

for each ip £ S. 

Proof. Consider the algebra Bs = {mod((p) | ip £ S}. Define /i* : Bs — > K by 

V*(mod(<p)) = fi((p), 

for each (p E S. Suppose that 9? and ^ are sentences such that mod(tp) = mod(tp). Then ip = ip 
is valid, and so fi((p) = fi(ip)- This shows that /i* is well-defined on basic sets. 
Clearly /x*(X) = fi*(mod(T)) = /i(T) = 1. 

Next it is shown that /x* is finitely additive on the algebra Bs- Let {mod(^)}?_j be a finite 
collection of pairwise disjoint sets in Bs- Suppose that, for some % and j, ->(ipi A ipj) is not 
valid. Hence ipi A ipj has a model, and so mod{ipi) fl modiipj) 7^ 0. Thus mod{ipi) fl mod(ipj) = 
implies -"(</?j A ^) is valid. Then 

n n. n n n 

/j,*(\Jmod(<pi)) = n*{mod(\J ifi)) = fi(\J = ^fifa) = fi* \mod( Vi )) , 

i=l i=l i=l i=l i=l 

where the second last equality follows from Part 6 of Proposition [191 Thus fi* is finitely additive 
on Bs- 

Now, by Proposition [301 I 1 * is countably additive on Bs- Since the alphabet is countable, 
Bs is countable, and so the Borel a-algebra B generated by the topology on I is the same as 
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the cr-algebra generated by Bs- By Caratheodory's theorem |Dud02| Theorem 3.1.4], there is 
a unique extension of /a* to the Borel cr-algebra S on I. ■ 



A probability //* : B — > R on sets of interpretations has the following intended meaning: 

For a Borel set B G B, li*(B) is the degree of belief that the intended interpretation 
is a member of B. 

We now consider probabilities defined on sets of separating interpretations. Let X be the set 
of separating interpretations (for the alphabet). A Borel cr-algebra can be defined on X. For 
that, a topology needs to be defined first. For each sentence ip, let mod(ip) denote the set 

{/ G X | if is valid in I}. 

Consider the set Bs = {fnod(p) \ ip G S}. Since Bs is closed under finite intersections, it is a 
basis for a topology T on X. Bs is also an algebra, since it is closed under complementation 
and finite unions, and X G B S - Let B be the Borel a- algebra formed from the topology T on 
X. In the following, probabilities on B will be considered. The Gaifman condition is crucial for 
them to be CA, since B is not compact unlike B. 

Suppose that the alphabet is countable. Then the set of terms and, in particular, the set 
S is countable. In this case, Bs is countable and hence the cr-algebra generated by Bs is the 
same as the Borel cr-algebra B generated by T. 

Note that there is a one-to-one correspondence between the set of probabilities on B and 
the set of probabilities on B which give measure to the set of non-separating interpretations. 
(The set of non-separating interpretations, and hence the set of separating interpretations, are 
shown to be S-measurable in the proof of Proposition [331 below. ) A probability ju* : B — > R can 
be extended to a probability /jl* : B ->■ M defined by ll*(B) =/i*(Bnl), for each B G B. Note 
that jj*{X\X) = 0. Conversely, a probability fi* : B — > R having the property that fi*(X\X) = 
can be restricted to a probability ■ B — > R defined by fi*\g(B) = jJ*{B), for each B G B. 

The next result shows that a probability on the set of separating interpretations gives a 
Gaifman probability on sentences. 

Proposition 32 (separating /x* /x Gaifman) Let the alphabet be countable, S the set of 
sentences, X the set of separating interpretations, and fi* : B — > R a probability on the Borel 
a-algebra B on X. Define // : <S — > R by 

M^) = V*{mod{ip)), 

for each tp G S. Then fi is a Gaifman probability on S. 

Proof. First, the two conditions of Definition [T7] have to be established. Note that fi is 
non-negative because /x* is. 

1. Suppose that <p is valid. Then fnod(ip) = X, so that /i(v?) = /i* (mo3(<^)) = /i*(X) = 1. 

2. Suppose that ->(<p A ip) is valid. Hence mod(p) fl modi^p) = 0. Thus 

= fi*(inod((p V %p)) 

= ^*(mod(p) U frwdipp)) 



18 



= n* '(mdd((p)) + /i*(mo3(^))) 



[/x* is finitely additive] 



Hence /x is a probability. 

Let r and s be closed terms of type a — > and t\, t^-, ••• an enumeration of all closed terms 
of type a. Then 

oo 

fno~d{r = s) = O mo3((r tj) = (s tj)). 
i=i 

To see this, suppose first that J G mo3(r = s). Then clearly / e mo3((r £j) = (s £j)), for each 
tj. Conversely, suppose that J is a separating interpretation such that I £ ffwdir = s). Since / 
is separating, there exists a closed term tj such that / ^ mod((r t,) = (s i,)), for some j. Hence 
/ ^ Hi=i = (s ^))- [Note, by the way, that mod{r = s) ^ Di=i mod({j- U) = (s tj)).] 

Since Vx.<^ is logically equivalent to Ax.<^ = Xx.T, it follows immediately from the remark 
of the preceding paragraph that 

oo 

inocl(\/x.(p) = O fnod((p{x/ti}). 

8=1 

Thus ^(y x .(p) = [i*(mod(Vx.(p)) 

= ^(f]T=i^d( V {x/U})) 

= lim^oo fJ>*(f)™ =1 mod((p{x/ti})) [/x* is countably additive] 

= lim„^ 00 /x*(moa(/\™ =1 ^{x/tj})) 
= lim„^ 00 /x(/\" =1 v3{a;/tj}), 

and so /x is Gaifman, by Proposition [23j ■ 

A probability /x* : i3 — > R on sets of separating interpretations has the following intended 
meaning: 

For a Borel set B <E B, /x*(S) is the degree of belief that the intended (separating) 
interpretation is a member of B. 

Next we show that a Gaifman probability on sentences gives a probability on separating 
interpretations. 

Proposition 33 (Gaifman fi =>- /x* separating) Let the alphabet be countable, S the set of 
sentences, I the set of separating interpretations, and B the Borel a -algebra on X. Let /x : S — > R 
be a Gaifman probability on sentences. Then there exists a unique probability jtx* : B — > R smc/i 
t/iflt 

jT(frwd{ip)) = fi((p), 
for each <p e S. 
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Proof. Let X be the set of interpretations. Then X \ X is the set of all non-separating inter- 
pretations. First we show that X \ X is immeasurable. Let r and s be closed terms of the same 
function type, say, a — > j5, and ti,t 2 , ... an enumeration of all closed terms of type a. Then 

oo 

mod{r ^s)nP| mod{{r ti) = (s ti)) 

i=l 

is a measurable set of non-separating interpretations. Since there are countably many such 
pairs r and s, and since 

too 
mo d(r ^ s) n p| mod{{r £*) = (s t t )) 
i=l 

it follows immediately that X \ X is measurable. 

According to Proposition EU there is a unique probability //* : B — > R such that 

fj,*(mod((p)) = fi(tp), 

for each (p G <S. We now show that yU*(X \ X) =0: 

fM*(mod(r ^s)n mod (( r ti) = ( s ti))) 
= mod (( r *<) = ( s *<))) - ^*(mod(r = s)) 

= lim^oo yW*(niLi mod((r ti) = (s tj))) — /i(r = s) [/x* is countably additive] 

= lim^oo p*{mod{f\ n i=1 {{r U) = (s ti)))) - y.{r = s) 
= lim„^ 00 /i(/\" =1 ((r U) = (s U))) - fi(r = s) 

= fi(r = s) — fi(r = s) [//is Gaifman] 

= 0. 

Hence /i*(X\X) = 0. 

Note that B C i3, since X is measurable. Define //* : B — > R to be the restriction of /i* to i3. 
Then, for each (p & S, 

//*(mo3(</?)) 
= fM*(mod((p) n X) 

= n*{mod(ip)) - fi*(mod{(p) n (X \ I)) 

= H*{mod{y)) [/i*(X\X)=0] 
= MvO- 

Also ju*(X) = /i*(X) = /i*(X) — ff(l\Z) = jJ*{X) = 1, so that /? is a probability. ■ 



Propositions [32] and [33] and imply 

Corollary 34 (/x*(X\X) = /j. Gaifman) For countable alphabet and any probability 
p, : 5 — ?• R on sentences and probability p* : B — > R on interpretations (one-to-one) related by 
p*(mod(<p)) = p((p) it holds that: p*(T\T) = 43- p Gaifman. 
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There is a concept of being strongly Cournot for probabilities on sets of interpretations that 
corresponds to that of being strongly Cournot for probabilities on sentences. 

Definition 35 (strongly Cournot fi*) A probability fi* : B — > R is strongly Cournot if, for 

each if G S, p> is satisfiable implies /i*(mod(<p)) > 0. 

Proposition 36 (strongly Cournot /^* O /x) Let S be the set of sentences and 1 the set 
of interpretations. Suppose that ji* : B — > M,, a probability on the Borel a-algebra on X, and 
jj, : S — ?■ R, a probability on sentences, are related by 

v(<P) = H*(mod(ip)), 

for each p> G S. Then /i is a strongly Cournot probability on sentences iff ji* is a strongly 
Cournot probability on sets of interpretations. 

Proof. Suppose that /i is a strongly Cournot probability on sentences. Let ip be a satisfiable 
sentence. Then n*(mod(p)) = fi(p) > 0, and so fi* is a strongly Cournot probability. 

Conversely, suppose that fi* is a strongly Cournot probability on sets of interpretations. 
Let p be a satisfiable sentence. Then /x(y?) = fi*(mod(p)) > 0, and so ji is a strongly Cournot 
probability. ■ 

As with probabilities on sentences, we can also define a Cournot condition for probabilities 
on sets of separated interpretations. 

Definition 37 (Cournot /^*) A probability /i* : B — >■ K. is Cournot if, for each p G S, <p has 
a separating model implies [i* (mod(p)) > 0. 

Clearly every strongly Cournot probability is Cournot. 

Proposition 38 (Cournot fi* O fx) Let S be the set of sentences and I the set of interpre- 
tations. Suppose that /i* : B — > M ; a probability on the Borel a-algebra B on X, and /i : S — > 1R ; 
a probability on sentences, are related by 

P(<P) = V*(rnod(ip)), 

for each p G S. Then n is a Cournot probability on sentences iff ji* is a Cournot probability 
on sets of interpretations. 

Proof. Suppose that ji is a Cournot probability on sentences. Let p be a sentence having a 
separating model. Then \i* (mod(p))) = /i(y?) > 0, and so /i* is a Cournot probability. 

Conversely, suppose that fi* is a Cournot probability on sets of interpretations. Let p> be 
a sentence having a separating model. Then fi(p>) = fi*(mod(p>)) > 0, and so /i is a Cournot 
probability. ■ 
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5 Existence of Probabilities 



Now we turn to the issue of the existence of probabilities. 

Definition 39 (discrete fi*) A probability /x* : B — > M. is discrete if there exists a count- 
able set of interpretations {Ji}£^i and a set of non-negative real numbers such that 
wij = 1 and, for each Borel set B, fi*(B) = 52t:/-e.B m *- 

Each rrii is called a mass. Clearly, a discrete probability is a probability on the Borel 
ex-algebra B. The set {Jj}^ is called the support of the probability. 

Theorem 40 (Cournot and Gaifman probability) If the alphabet is countable, there ex- 
ists a probability on sentences that is Cournot and Gaifman. 



Proof. Consider an enumeration XuXzi-- °f the countable set of sentences which have a 
separating model. Choose a separating interpretation Jj in mod(xi) and assign the mass rrii = 
to I it fori = 1,2,... . 

Define //* : B — > K to be the discrete probability defined by the masses assigned to this 
countable set of interpretations. That is, for a Borel set B e B, /x*(J?) = Y^i-i^B * s ^ ne 
sum of the masses of the subset of separating interpretations in {Jj}^ that are members of 
B. It is possible that the same interpretation is chosen for more than one mod(xi)', in this 
case, the masses corresponding to each choice of that interpretation are added together. /x* is 
a probability, since it is a countable sum of point masses, and /x*(X) = Y^hLi iu+i) = ^ Since, 
for all i, fj,*(mod(xi)) > > ^> ^* * s Cournot. 

Now define /x : S — > R by /x(<£>) = fi*(mod((p)), for E S. By Proposition [2H1 /x is a 
probability on sentences. Also, by Proposition 133 /i is Cournot. Finally, note that, if I is 
the set of interpretations and X the set of separating interpretations, then fi*(T \ X) = 0. 
Consequently, the restriction of /x* to B is a probability on B and /x(<£>) = /x* (moal((p)) , for 
<p & S. Thus, by Proposition [321 /x is Gaifman. ■ 



Note that the support of the discrete probability /x* constructed in Theorem HQ] is a dense 
subset of X, since there is a point from the support of the probability in each set in a basis for 
its topology. Every class of separating models that can be characterized by a finite number of 
axioms can also be characterized by a single sentence, hence is assigned a non-zero probability. 

Proposition 41 (strongly Cournot probability) If the alphabet is countable, there exists 
a probability on sentences that is strongly Cournot. 

Proof. Consider an enumeration xi, X21 ■■■ of the countable set of sentences which have a model. 
Choose an interpretation Jj in mod(xi) an d assign the mass jn+fj to Jj, for i = 1, 2, ... . 

Define /x* : B — > K to be the discrete probability defined by /x*(J?) = Xli /igB ^+1) f° r B E B. 
/x* is a probability, since it is a countable sum of point masses, and /x*(X) = XlSi 7(1+17 = 
Since, for all i, fj l *(mod(xi)) > ^+1) > 0> A 4 * is strongly Cournot. 

Now define /x : 5 — >■ K by /x(<£>) = /x* (mod((p)) , for {p <E S. By Proposition [291 /x is a 
probability on sentences. Also, by Proposition I3T)| /x is strongly Cournot. ■ 



Now we give some illustrative examples concerning the various classes of probabilities that 
have been introduced. 
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Example 42 (a probability which is not Gaifman) Choose an alphabet for which there 
exists a non-separating interpretation. Construct fi* by putting unit mass on some non- 
separating interpretation. The probability on sentences corresponding to fi* is not Gaifman 



by Corollary 34 



Here is such an alphabet and interpretation. Let there be no non-logical constants in the 
alphabet. Let the interpretation I be the standard model defined as follows. The domain 
T^i — {d}- Each T> a _^p consists of all functions from T> a to Dp. Note that d is not the 
denotation of any closed term of type %. Now consider Ax.T and Xx.-L, each having type % — > o. 
Clearly V(Xx.T , I) ^ V(Ax._L,J). However, there does not exist a closed term t of type i such 
that V((Ax.T t),I) 7^ V((Ax._L t),I). Hence I is not a separating interpretation. (} 



Theorem HD] shows that, for any countable alphabet, there is always a probability which 
is Cournot and Gaifman. The next example shows that it is not guaranteed that there is a 
probability which is strongly Cournot and Gaifman, because these two concepts may conflict 
on non-separating interpretations. 

Example 43 (a probability which is strongly Cournot but not Gaifman) Choose an 
alphabet for which there exists a non- separating interpretation. Construct /i* by forming an 
enumeration (p±, <p 2 , ■ ■ ■ of all satisfiable sentences, and putting mass | on some non-separating 
interpretation and for each i mass ^ i+1 ^ i+2 ) 071 an interpretation in mod(<pi). The probability 
on sentences corresponding to ft* is strongly Cournot, but not Gaifman. (} 



Example 44 (a probability which is Gaifman but not Cournot) Choose an alphabet 
for which there exist two disjoint sentences each having a separating model. Construct fi* by 
putting unit mass on a separating model of one of the sentences. The probability on sentences 
corresponding to fi* is Gaifman but not Cournot. 

Here is such alphabet and pair of sentences. Let d be any element, T) l = {d}, and, for 
definiteness, each domain D a ^.p the set of all functions from D a to Dp. Each of the domains 
D a is finite. Let there be a non-logical constant a of type i such that V(a) = d. The domain 
D^ consists of two functions, one that maps d to T and is the denotation of Ax.T, and one 
that maps d to F and is the denotation of Ax._L. For each element of each of the domains D a ^p 
(other than D % ^ ) introduce a non-logical constant of a suitable type into the alphabet in such a 
way that the denotation of the constant is the corresponding function. Note that every element 
of every domain is the denotation of a closed term. Now introduce a non-logical constant p of 
type % — > o. For the interpretation I 1; take everything defined so far and give p the denotation 
d i— > T. Then I x is a separating model of the sentence (p a). On the other hand, for the 
interpretation J 2 take everything defined so far except give p the denotation d y F. Then I 2 is 
a separating model of the sentence ->(p a). Finally, note that (p a) and ->(p a) are disjoint. 

Example below provides another such alphabet and sentence, but with infinite domain 
V t = {0,1,2,...}: There, Wx.(B x) and ->Wx.(B x) each have a separating model, say I and 
I'. Hence we can set fi*{I') = 1, which implies /i(Vx.(S x)) = and so fi cannot confirm 
\fx.(B x). Note that fi is Gaifman by Corollary l3~4\ but not Cournot. <0 

Example 45 (a probability which is Cournot but not strongly Cournot) Choose an 
alphabet for which there is a sentence having a non-empty set of models all of which are non- 
separating. Construct /i* by forming an enumeration <pi,<p 2 ,... of all sentences that have a 
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separating model and putting mass jn^. on a separating interpretation in mod(ipi), for each i. 
The probability on sentences corresponding to fi* is Cournot but not strongly Cournot. 

Here is such an alphabet and sentence. Let the alphabet contain the non-logical constants 
a of type i and p of type i — > o. Consider the sentence <p = Ekc.(-i(p x) A (p a)), which has a 
model. Let I be any model for ip. Then for I the domain V t must have at least two elements, 
one of which is the denotation of a and where none of the others is the denotation of a closed 
term of type %. Clearly V(p,I) ^ V(Xx.T , I). However, there does not exist a closed term t 
of type i such that V((p t), I) ^ V((Xx.T t), I). Hence I is not a separating interpretation. (} 

Example 46 (standard interpretation of Nat) This continues Example \24\ As non- 
logical constants in our theory we consider : Nat and S : Nat — > Nat, and abbreviate 
n = S n (0) = (S (S (S • ■ ■ (S 0)))). The standard interpretation I is defined as follows: 
The domain T>^ at = {0,1,2,...}, and each domain T> a ^p is the set of all functions from 
T> a to "Dp. We interpret V(n,I) = n and V(S) : D^at T^Nat is the successor function 
mapping n to n + 1. This interpretation satisfies the Peano axioms \/x.(S x) ^ and 
WxXfy.((S x) = (S y)) — > (x — y) and Vp.(((p 0) A Vx.((p x) — > (p (S x)))) —> Wx.(p x)). 
We can add to our logic any number of constants of type Nat — > o. Let J be the set of 
interpretations obtained by augmenting I with any valuation of these new constants. Every 
interpretation in J (still) satisfies the Peano axioms. Here and in later examples we only add 
one such predicate B : Nat — > o, used for induction. For any probability /i* that concentrates 
on J , i.e. £t*(i7) = 1, fi(Wx.(ip x)) = lim^oo /i((y? 0) A ... A (if n)) holds for every closed term 
ip of type Nat — > o, and in particular for B . () 

Example 47 (non-standard interpretation of Nat) Consider Example and modify the 
interpretation I to I' as follows: Expand V^at to V^at = {0, 1, 2, ...} U {..., —2, —1, 0, 1, 2, ...} 
and V(S) mapping h n+1 in addition to n M- n + 1. We call n £ {..., —2, —1, 0, 1, 2, ...} ; 
non-standard numbers. As before, augment I 1 by an interpretation of B. Here we only 
consider valuations V(B) that are true everywhere, except on a single non-standard number, 
say c. This leads to a non-separating interpretation V , since 3x.-<(B x) is valid in I' but 
there is no closed term t for which —>(B t) is. Note that every closed term of type Nat 
has some standard number n as denotation. For a point probability fi* that concentrates 
on I' we therefore have /i(Vx.(B x)) — but fi((B t)) = 1 for all closed terms t of type 
Nat. Hence /i is not Gaifman and cannot confirm \fx.(B x). Note that V even satisfies the 
'Peano" axioms if either Vp is replaced by 'for all closed terms p of type Nat — > o" or a 
suitable subset of {T, F} x,JVaf is chosen for , T>^ at ^ t0 . (this is due to the absence of + and x). <0 

Example 48 (the description operator i) We can use the previous Example t° 
illustrate the complications a description operator i causes. Let constant t(jvat->o)->JV«t de- 
note a function that selects the unique member of a singleton set ((i (\x.(y = x))) = y). 
Since V((i -iB),I') = c, (i -iB) = n is not valid in I' for any standard number, and 
fi(B (i -i-B)) = 0. Indeed, i makes accessible all non-standard numbers via c+k = S k (i —>B) 
and c^k = (l Xx.(-iB S k (x))). Hence V is now separating for type Nat and all non-standard 
numbers must be included in the enumeration of terms in the Gaifman condition, even if 
we only care about the standard interpretation. We do not know how to avoid this problem, 
e.g. adding additional axioms that constrain i. On the other hand, i can easily be eliminated 
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from the logic (the basic idea is that formulas like (p (l B)) can be replaced by something like 
(3\x.(B x) A (p x)) V (-3\x.(B x) A (p 0)) ). 



At least asymptotically, the Cournot and Gaifman probabilities constructed in the proof of 
Theorem HQ] are good priors for sentences, since they are non-dogmatic [GS82]. We will use 
them in Sections [6] and d called £ there, to construct minimally more informative distributions 
given some background knowledge like non-logical axioms. 

After having seen various examples of (non) Cournot and (non) Gaifman probabilities, we 
now give a general characterization of Gaifman and Cournot probabilities. 

Definition 49 (rigid mixture representation) Let Xi>X2> ■■■ be an enumeration of all sen- 
tences that have a separating model. We say that a probability fi : «S — > R on sentences has 
a mixture representation iff fi(<p) = Yl^Li 171 ^^) f or some {nii > 0} and J2i m i = 1 ana " 
probabilities fii satisfying fJ>i(xi) — 1 (hence fii(^Xi) = 



Theorem 50 (probability characterization - Gaifman and Cournot) 

Let fi be a probability on sentences. Then 

fi is Cournot fi has a rigid mixture representation 

(and Gaifman) (and all fii in Definition \JI^ are Gaifman) 

This result eases the construction of Cournot fi, in that it reduces the problem of finding 
a single fi that simultaneously satisfies the infinitely many conditions fiixi) > Vxt to the 
problem of finding infinitely many probabilities fii with each only satisfying one constraint 
Hi{Xi) > o. 

For instance, as in the proof of Theorem HQ], for any Jj G mocL(xi)i /ij(<£>) := \h G mo9(<£>)] 
satisfies fii(Xi) — 1- This also shows that some Cournot (and Gaifman) fi can be built purely 
from deterministic measures /ij G {0, 1}, i.e. sets of models. Corollary [531 below illustrates more 
generally how Theorem [50] can help. 

Proof. With the notation of Definition H9] we have: 

(Cournot-^) Assume if has a separating model. 

Then tp> = Xi for some i, and hence fi(<p) = fi(xi) — m i^i{.Xi) > 0. 

(& Gaifman^) A linear combination fi of Gaifman fii is itself Gaifman. 

(Cournot^) Consider N-partition 

T:={teN:fi( Xi ) = l}, 

£ := {i ^ T : Xi starts with an even (incl. zero) number of negations ->}, 

O := {i <^ T : Xi starts with an odd number of negations ->}. 

and let c : 8 — > O biject Xc(i) = ~%- Let ip be an arbitrary sentence. 

For i G S : fi(<p>) = fi((p\xi) n{Xi) + KvhXi) K^Xi) 
Let J2iee r i = 1 an d Ti > and rrii = \riPi > and m c {i) = \fiiX — Pi) > for i G £. Then 

i&£ ie£ ie£0O 
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: | with rrii > 0. Then //(</?) = 

oo 

i=i 



The next theorem is a complete characterization of general and (strongly) Cournot or Gaif- 
man probabilities on sentences. It is based on a tree construction: Consider a sequence of (some 
or all) sentences (pi,(p 2 ,ips, arranged in a finite or infinite complete binary tree with all left 
(right) children at depth n labeled by -nf n (ip n ) as depicted below. Furthermore, each node 
stores the /^-probability of the conjunction ip nj s of sentences along the edges from the root to 
this node. 




Proposition 51 (i/^s^-tree) Fori = l,...,n, let ipi be a sentence. For each S C {1 : n} = 
{1, ...,n} 7 define the sentence tp n! s by 

VV? = (A^)A( A lil- 
ies je{l:n}\S 

Then the following hold. 

1. The ip n ,s ,s are pairwise disjoint. 

2 - Vsc{l:n} ^n,S & Valid. 

3. For each i — 1, ...,n, ipi is logically equivalent to \J i> n ,s- 

SC{l:n}:i€S 

Proof. Straightforward. ■ 

The following is our main characterization theorem. It states necessary and sufficient condi- 
tions on the labels a nt s '■= n(ip n ,s), for general /x, as well as (strongly) Cournot /z, and sufficient 
conditions for Gaifman [i. We do not yet have a complete tree characterization of Gaifman 
probabilities, which is a major open problem. The characterization can easily be converted to a 
procedure that assigns probabilities to one sentence after the other, but it is not an algorithm, 
since satisfiability is not decidable. 



For i G T define fi^ip) := jj(ip) = ^(ip\xi) and Y.ieT m i = 
J2ier^ m " i l J ' i ( i fi)- Adding both representations gives 

iesiio i&T 

with Y^Li m i — 1) m i > 0) A*t(Xi) = 1 as needed. 
(&Gaifman=^>) ji Gaifman implies /Xj = fJ>(-\Xi) Gaifman. 
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Theorem 52 (tree characterization of general/Cournot/Gaifman probabilities) Let 

the alphabet be countable and <fi, (£2, <f3, ■■■ an enumeration of all sentences. For each n > 1 
and each S C {l:n} ; define the sentence ipn,s by 

i/> n ,s = (/\<Pi) A( f\ -^(fj). 

ieS je{l:n}\S 

1. Let ii be a probability on sentences. Then, for each n > 1, 

MVV> = ^WV?)- (1) 

SC{l:n}:neS 

Furthermore, jj, is Cournot (resp., strongly Cournot) iff, for each n > 1 and S C {l:n}, 
ipn^s has a separating model (resp., is satisfiable) implies fi(ip n ,s) > 0. 

2. For each n > 1 and S C {l:n}, let a Ht s £ K satisfy the following conditions. 

(a) a n<s > 0. 

(b) If ip nt s is unsatisfiable, then a n> s = 0. 

(c) a niS = a n+ i :S + «n+l,5U{n+l}- 

( d ) Esc{l:n}«n,S = L 

Then there exists a probability jj, on sentences such that, for each n > 1 and each S C 

3. Suppose that, in addition to the conditions in Part 2, the following condition also holds: 
for each n > 1 and S C {1 :n}, tp n ,s has a separating model (resp., is satisfiable) implies 
c*n,s > 0. Then ji is Cournot (resp., strongly Cournot). 

4- Suppose that, the conditions of Part 2 hold. Strengthen 2b by demanding that if ip n ,s has 
no separating model, then a Uj s = 0. Further, assume that enumeration y?i,y?2, ••• is such 
that if (p n+ i = [r = s] for terms r and s having the same function type, then (p n+ 2 = 
Vsc{i-n} VVi,s A ip{x/ts}, where ip := [(r x) = (s x)] and ts is such that tp nt s A ^tp{x/ts} 
has a separating model (if no such t$ exists, choose ts arbitrarily or drop this contribution 
from \J). For (p n+ i = [r = s] also set a n+ 2,s — a n+i,s- Then /i is Gaifman. 

5. For every probability ji, a Ht s '■= n(jpn,s) satisfies 2(a)-(d). 

Items 1,2, 3, and 5 are rather natural. The somewhat ugly item 4 requires explanation: First, 
the assumption on the enumeration ipi can easily be satisfied by inserting appropriate (p n+ 2 at 
the required n. The intuition behind the construction for n = is that if J is a model of -«pi, 
i.e. of 3x.-i<f, Gaifman requires a witness t, which exists by the extensionality axiom. We can 
guarantee such a witness by putting ip 2 = (p{x/t} and following exclusively the -k/?2 branch by 
setting ct2,{2} = 0- F° r general n, the witnesses t and hence <p n +2 = ^p{x/t} may depend on S; 
this would lead to a branch-dependent enumeration of sentences. There is nothing wrong with 
this, and is probably even the preferred solution. In order to keep things simple, we kept the 
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enumeration branch independent by or-ing (p n+ 2 over all 2 n branches, which makes it formally 
independent of the branch S. 

Proof. 1. The first part follows immediately from Parts 1 and 3 of Proposition \5T\ and 
Proposition [T9l6. 

The second part for strongly Cournot follows immediately from the definition of a strongly 
Cournot probability, Proposition \5T\3, and Proposition [TJ04: That strongly Cournot implies 
^{^n,s) > for satisfiable ij) n> s is trivial. For the other direction, (p n is satisfiable implies that 
there exists an S 3 n for which ip n ^s is satisfiable. Hence f-i(<p n ) > ^(^n,s) > 0. Thus fi(f) > 0, 
for all satisfiable <p, and so /x is strongly Cournot. The proof for the Cournot case is similar. 

2. First define /x : {i> n ,s}n>i,sc{in} -> K by 

for each n > 1 and S C {1 :n}. We prove by induction that, for m > n, 

R:SCRCSU{n+l,...,m} 

The result is obvious when m = n. Suppose now it holds for m. Then 

= T l R:SCRcsu{n+i,...,m} a m,R [Induction hypothesis] 

= Y.R:SCRCSU{n+l,...,m}( a rn+l,R + «m+l,RU{m+l} ) 
:SCRCSU{n+l,...,m+l} a m+l,R- 

This completes the induction argument. 
Now define fi : S — > M. by 

Kfn) = a ^ S - 
SC{l:n}:neS 

for each n > 1. We prove by induction that, for m > n, 

SC{l:m}:neS 

The result is obvious when m = n. Suppose now it holds for m. Then 

= Esc{i:m}:nes a m,s [Induction hypothesis] 

= zJsc{l:m+l}:neS a m+l,S- 

This completes the induction argument. 

We show that \x extends \Lq. Suppose that Vn,s> f° r some n > 1 and S C {lira}, is 
for some k > 1. Let m = max{/c,n} and ^4. = {i? : k £ R C {l,...,m}} and 23 = {i? : S C 
i?CSU{?i+l, ...,m}}. Then \J R&A ip m ,R is logically equivalent to which is equal to ip n ^s 
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which is logically equivalent to \J R€B ipm,,R- Also the i\) m ,R are pairwise disjoint. Hence ip m .R is 
unsatisfiable (and so a m ,R. = 0) for each Re (A \ B) U (B \ A). This implies 

ReA ReB 

In summary, \i : S — > 1R is well-defined and satisfies 

K<Pn) = MV'n.s), 
8C{X:n}:neS 

for each n > 1. 

We show that ju is a probability. Clearly, /i is non-negative. Now suppose that, for some 
n > 1, <p n is valid. 

Then, for n E~ S, ipn,s is a conjunction that contains ->ip n , hence is not satisfiable and 
therefore a Ut s = for n S. This implies 

K^n) = ^ a ' n < S = a ' n ' S = 1 - 

SC{V.n}:n<=S SC{l:n} 

Finally, suppose that ->(<fn^fm) is valid. There exists k > 1 such ipk is y? n Vy? m . Choose any 
p greater than n, m and k. Consider A := {S C {l:p} : k E S} and B := {S* C {1 :p} : n E S} 
and C := {S C{l:p} :mE S}. 

a p,s — for S E B DC, since ip n) s is a conjunction containing <p n A </? m . 

a p,5 — for A \ (B U C), since ^5 is a conjunction containing A -np n A -"</? m . 
= for (B U C) \ ^4, since is a conjunction containing -K^ fc A <p n A y? m . 
Together this implies 

fj,(ip n V v? m ) = M^fc) = = a P' s + = Mvv) + ViVm)- 

SeA seB sec 

Thus /x is a probability on sentences. 

3. For the strongly Cournot case, suppose that, for some n > 1, (p n is satisfiable. Thus 
ip n ,S' is satisfiable for some S' C {1 : n} for which n G S 1 '. By the condition, fJ,(ip n> s') > 0. 
Hence fJL((p n ) = J2sc{i- n }-nes M^/VsO > 0- The Cournot case is similar. 

4. 3x.<^ has separating model (s.m.) ^ there exists t such that ^{x/t} has s.m. The 
direction follows from Definition [12] with r = Ax.<^ and s = Xx.T. The •<= direction follows 
from (p{x/t} — >■ 3x.ip. 

We need to show the Gaifman condition in Definition [20j This is equivalent to: For all 
terms r and s having the same function type, //(r = s) = hin^^oo //(/\™ j_((r ij) = (s i;)))- 
Fix r and s, define := [(r x) = (s x)]. Using the extensionality axiom we hence have to show 

m 

fjt(Vx.<p) = lim fJ>(/\(p{x/ti}) 

i=l 

Consider n such that <^n+i = [r = s] = By assumption, ip n+ 2 = Vsc{i-n} ^"> s ^ ^{^As 1 }- 

We first prove that setting a n+ 2,s = «n+i,s is allowed: 
Assume ip n ^ A ->(p n +i = 3x.(ip ny s A has s.m. 
=>■ There exists s.th. r/^g A -^{x/ig} has s.m. 
=^ VVi.s A ~"fn+i A -^{x/ig} has s.m., since ~^(p{x/ts} implies ->tp n+ x. 
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The last expression is logically equivalent to -?/>n,s A -np n +i A -iy9 n+2 , since for ip n>s = JL, both 
expressions are false, and for ip n ^ = T, T/> n ,S' = -L f°r an 5" 7^ S'j hence \J s in <^n+2 collapses to 
<p{x/ts}. Since r/> nj g A -iy n +i A ~^(p n +2 has s.m., a n+ 2,s = a n +i,s is allowed. Assume now that 
4>n,s A -i(^n+i has no s.m. Then ip njS A -«p n +i A -<y? n+ 2 has neither, and a n+2 ,s = "n+1,5 = 0. 
Hence (4.) is a consistent instantiation of (2.) and generates a probability on sentences /x with 
MVw.s 7 ) = ov,S' for all n' and 5". We now prove that it is Gaifman. 
For fi(yx.<f A ip n ,s) > 0, trivially 

rn 

fj,(/\(f{x/ti} I \/x.(f A tpn t s) = 1 = Htyx.tp | Vx.(y9 A Vn,s) 
i=l 

For [i(-Nx.(p A ?/Vi,s) > and sufficiently large m, 

m 

fi(f\if{x/ti} \^Wx.ip A4> n ,s) = = /i(Vx.y? I -tfx.tp A Vn,s) 

i=l 

since //(^?{z/fe}|->Va;.<£> A = //(-^rH^hVx.v? A Vn,s) = a n+ 2,5/a re+ i,s = 1, and 

Ai=i will eventually contradict -np{x/ts}- 

Since both displayed equalities hold for all S C {1 :n}, for sufficiently large m this implies 

5. Straightforward. ■ 

Unfortunately items 3 and 4 in Theorem [52] cannot be combined. The /x in item 4. is not 
Cournot, since e.g. -icp n +i A y? n+ 2 has a separating model if there is more than one possible 
witness t s , but is assigned zero probability. We can do something else though. 

The following corollary boosts Gaifman /x constructed in Theorem [52j4 with the rigid mix- 
ture representation to a Gaifman and Cournot /x, and this without having to choose interpre- 
tations / as required in Theorem 1501 

Corollary 53 (Gaifman and Cournot probability) Let xi,X2, ■■■ be an enumeration of all 
sentences that have a separating model. For each i, let <pi := Xi, <£>2, f3, •■■ be different (in the 
first sentence) enumerations of all sentences, and ^ be a corresponding Gaifman probability 
constructed in Theorem\5^4> choosing /Xj(Xi) = a i,{i} '■= 1- Then by Theorem [50|. the rigid 
mixture /x of Definition is Gaifman and Cournot. 

6 Relative Entropy of Probabilities on Sentences 

Assume we "know" the probabilities /xo( ( / 3 i) °f sentences <pi, ip n . Note that /xq : {ipi, <p n } — > 
[0, 1] is not a probability on all sentences, but only a partial specification. In the next section 
(Proposition [571) we derive conditions under which /xo can be extended to a probability over all 
sentences. 

However, if there are any solutions at all, then there are many. It then makes sense to ask 
whether some distributions that meet our constraints are "better", in some sense, than others. 

A natural idea is to choose /x in such a way as to be "as uninformative as possible", con- 
sistent with our constraints as defined by /Xq. Unfortunately it is not possible to define "as 
uninformative as possible" in absolute terms, but we can define it relative to a prior distri- 
bution, £. We will formalise this using the concept of relative entropy, or Kullback-Leibler 
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divergence. We now show that this selection of \i has exactly the form of a piecewise re-scaled 
£, and show how to find the optimal rescaling constants, the various as introduced in the next 
section, under this criterion. Natural choices for the prior £ are the non-dogmatic probabilities 
constructed in Theorem HUJ 

We start by introducing the relevant concepts on general measure spaces before constructing 
the new distribution /x that meets our constraints while being uninformative relative to our 
Prior, f . 

From [Iha93t p.21] |Csi75] : Let /i* and £* be probabilities on a measurable space (X,£>(X)). 
We say that /i* is absolutely continuous with respect to £*, jj* -< £*, if n*(A) = for every 
A e B(X) such that £*(A) = 0. By the Radon- Nikodym theorem |Dud02l Theorem 5.5.4], if 
H* is absolutely continuous with respect to £*, then there exists a £*-integrable function ip(x) 
such that 

fji*{A) = [ ip{x)dC(x), VA e B(X). 

J A 

The function ip(x) is called the Radon- Nikodym derivative and is written in the form 

For probabilities fi* and on (X, B(X)), the relative entropy KL(/x*||£*) of /i* with respect 
to £* is defined by 

kl(^iid := (Jx^f^)^) if/^^e, 

1 oo otherwise. 

The measure £* is referred to as the reference measure. 

By reference to this general definition for relative entropy, we can define the relative entropy 
for probabilities on sentences in two ways: 

Definition 54 (relative entropy on sentences) For a countable alphabet and for probabili- 
ties fi and £ defined on some set of sentences S, the relative entropy KL(fi\\^) of fi with respect 
to £ is defined by 



KL^\\0 := lim J2 MV*)log^4 = KL (»*W? 



SC{l:n} 

where 01og| := and /xlog^ := oo if p, > 0. The last equality holds true if \i* and £* are the 
probabilities in Proposition [31\ on interpretations that correspond to fi and £ respectively. 

The first definition is more general and useful and conceptually easier. Since the relative 
entropy increases with refinement, the limit always exists and is independent of the order of 
enumeration of sentences. The second definition is the "obvious" choice for a definition, but is 
more restrictive and based on much heavier machinery. Equivalence follows from exchanging 
limits with integrals, which requires some justification. 

Proof, (sketch) (i) Order independence: Let $ be a finite set of sentences, and KL$(/i||£) be 
the relative entropy of the sentences in $. Then by the monotonicity of the relative entropy 
under refinement, $ C $' implies KL$ < KL$'. It is now routine to establish independence of 
the limit on the order of enumeration of the sentences. 
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(ii) Equivalence of both definitions: For /i* -/( £* one can show that the limit diverges, which 
implies equality. We will only prove the interesting case when fi* -< £*. Let (fi,(f2,--- be an 
enumeration of all sentences. For an interpretation J el, let S be such that ip nt s is valid in /, i.e. 
S = S(n,I) := {i E {1, ...,n} : / G mod((fi)}. Using fJ,(ip n ,s) = fJ,* (mod(ip n> s)) = J mod ^ nS ) 
let 

KL„(^||0 := £ /.« s )log^4 = / lo S#4^ > 
KL*(/x||0 := jflog^d^ > 

Elementary algebra (telescoping property of KL) allows us to split KL* into a finitary and a 
tail part 

KL»|0 = KL„(/i||0 + £ /.(V„ ) s)KL*(//(# niS )i|e(-iVn J 5)) 

5C{l:n} 

which shows that KL* > KL n . 

For the other direction, let T n be the Borel cr-algebra generated by {mod(i/j n} s) : 5 1 C {1 : n}}. 
Then 7j C J 2 C ... is a filtration with J 7 ^ = £> the Borel a-algebra generated by UroLi^n- 
Define 

z (I) ^*(mod(^ n: s)) fJ>(4> n ,s) 
n[ ) ■ ?{mod(^ n , s )) ^n,s) 

Z n : X — > R is an J- n measurable function, well-defined with ^-probability 1 (w.£.p.l). Z\, Z 2 , ... 
forms a ^-martingale sequence, since 

[7 it- 1 M?/Wl,s) e / , , , \ | MVWfl,SU{n+l}) >./ , I / \ 

UWt-l.Sj 4lVWl,SU{n+l}J 



n,5j 



£(V>n,s) 

Since //*-<£*, by [Doo53t VII§8] the sequence converges to the Radon-Nikodym derivative 

lim ^ n = IcZ w.f.p.l 
Now consider 

KLnMIO = £ 7^1og#4e(^,5) = [z n \o g Z n dC 

By Fatou's lemma applied to 1 + Z n \ogZ n , which is non-negative, and the existence of the 
pointwise limit Z n w.£.p.l, we get 



liminf KL n (/i||£) > / lim inf Z n log Z n d£* 

n— >oo ' J j- n— >oo 
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Since KL n is monotone increasing and together with KL* > KL n , we have lim^oo KL ra = KL*. 
This shows the equivalence of both definitions in Definition [5H ■ 

Given some base measure £*, we are interested in finding a measure ju* that minimizes 
KL(/x*||£*) under some 

constraints / f i (x)d'j2*(x) = Oj, i = l,...,n. (2) 

We assume that these constraints are satisfiable for some //* -< £*. 

[Iha93] defines the KL-projection of a probability under some constraints as the measure 
that minimises the relative entropy subject to those constraints. In practice, the KL-projection 
is defined by giving a Radon- Nikodym derivative that re-scales the original probability to meet 
the constraints. This is similar to the rescaling used in the proof of Proposition [57] below. 

[Iha93l pp. 104-5] proves the following: Define functions 0j(A), i = l,...,n, of A = 
(Ai,...,A n ) el" by 

#i(A) =^7TT / fi( x ) ex v\ y^^jfj( x ) \ d C(x), i = l,...,n, 



where $ ( A) = exp | X j fi ( x ) } d C 



x). 

We denote by A the set of all A for which the integrals above converge, and define a set 
A C (1U {-oo}) n by 

A={(6x(\),...,9 n (\));\eA}. 

Let Mx be the set of all probabilities on (X, B(X)), G Mi be a fixed reference measure, 
and fi(x), i — 1, n be real functions defined on X. Assume that F C Mi is a set of the form 



{/i* G Mi : / /j(rr) d/x*(rr) = a i: i = 1, ...,n}, 
ix 



where a^, i = 1, n are given constants such that (a\, a n ) G Then the KL-projection Ji* 
on F is given by 

' X ) = -^kT eX P ^ ] X ifi( X )\' ( 3 ) 



' $(A) 

where A = (Ai, A n ) G A is a vector uniquely determined by solving 



3/ J — Ci^ 5 ^ — 1 5 • • • j • 



•3= 

The corresponding minimum relative entropy is given by 



KL(/2*||r) = ^A J a l -log$(A). 



i=i 
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We will construct, where possible, a function 11 that has minimum relative entropy with 
respect to £ while still satisfying our constraints as represented by li$ and the fi, i — 1, ...,n. 
First, we construct a function fi* on interpretations that will end up meeting our constraints 
while minimising the relative entropy to £*. 

Choose the fi = [mod(^i)] as indicator function on X = X, which is 1 on models of ipi and 
zero elsewhere. Set a* = fio(ipi), % = 1, ...,n. The constraints (j2J) then reduce to 

fj,((pi) = /j,*(mod((pi)) = J [mod{ifi)\dp* = a { = fJ, {(Pi) 

as intended. 

Equation (j3J) then tells us that the scaling function, between £* and /i* is piecewise 
constant. In particular, j^- is constant across each of the sets mod(ips) related to the sentences, 
ips, constructed in Proposition! 



c>»*» = / S<«*= E / !j£ « 



mod(ip) 

2 / [Equation©] 



5^ $7XT 6XP I S ) 'ifi( mod (4's)) I £*{mod{(p A ^s)) [/» constant on mod(i/) S )] 

SC{l:n} ^ ' ^ i=l ^ 

i- J2 expl^^k^A^) [fi = liEieS] 



5C{l:n} v ies 



This leads to the following definition for fi: 



Definition 55 (minimally more informative probability) Let £ be an arbitrary probabil- 
ity on sentences, and /xq : {fi, f n } ~ > [0,1] constrain the probability ft of the sentences 
<Pi,..., (fn- Let 



AM 
w s 
$(A) 



^2 w s£(<P^4>s 

SC{l:n} 
1 

i(A) 



exp 



SC{l:n} MeS J 

SC{l:n} Ssi 



[Defining equation] 
[Weights] 

[Normalizing constant] 

Consistency equations 
for Xt GMU{-oo} 



if the expressions are well-defined and a solution exists. Otherwise fi is undefined. We call ft 
minimally more informative than £ given liq (if it exists). 

For tp = ips'i only the term S = S' contributes to the defining equations, which gives the 
useful relation ft(ips') — W S' £{i>s')- So indeed, % = fi{ips) I '£C0s) is the local scaling factor. 
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Inserting this back into the defining equation, gives 



AO) = £(V>s)£te#s)- (4) 

This also implies that if £ is Gaifman, then p,((p) is Gaifman. Furthermore, £i(ip) > whenever 
consistently with fi possible and £(</?) > 0, i.e. for (strongly) Cournot £, fx is as "Cournot" as 
possible. 

Proposition 56 (minimally more informative probability) If liq can be extended to a 
probability on S, and prior ^{ip n ,s) > for all satisfiable ip n ,s, then fi in Definition [331 is 
the unique minimum of the relative entropy w.r.t. £ under the constraints fi((pi) = /^o(y?i), 
i — 1, n: 

min {KL^WO} = AL(/i||fl 

ti:fi(<Pi)=fj,o(<Pi),i=l..n 

= ^ M^)log^ = ^W^)-log$(A) 

5C{l:n} SvPS) i=1 

Proof. A measure-theoretic proof can be based on the second definition in Definition [54] and 
Equation Q. Here we give an elementary proof based on the first definition: First note that the 
sum over 5* is well defined and finite, since £(ips) — implies ips unsatisfiable implies fi(il>s) — 
by Proposition [133. Therefore, wherever necessary or convenient, we interpret sums as being 
restricted to those S for which ip s is satisfiable. We have 



mm = £ Mvvoiog*^ 



/^(V'm^UTl^s) 



+ lim y~) P{ll>n,s) y] MV , m,S'U2#n,s) 1°S 



By multiplying the first term with 1 = J^TCfn+i-m} f i (i J m,suT\i J n,s) and elementary algebra 
one can easily verify that this expression indeed reduces to the first one in Definition [5H Now 
we need to minimize this w.r.t. to fi. The first term involves a constrained minimization over 
the 2™ — 1 "parameters" fi(ips) '■ S C {1 : n}. The second term (for fixed m) involves a free 
minimization over the 2 n (2 m_n — 1) parameters n{tpm t suT\' t Pn,s) '■ T C {n+1 : m}, S C {l:n}. 
Since the two parameter sets are independent, we can minimize both terms separately. Since 
there are no constraints for the second minimization, and the second term is monotone 
increasing in m, the unique solution is obviously //(VVsurlVVs) — £(V , m,sur|'0n ) s)- The first 
term, since £(if) nj s) > and the relative entropy is non-negative and continuous and strictly 
convex and the domain is finite-dimensional convex and compact (a 2™ — 1 dimensional 
probability simplex), it has a unique minimum on the convex subspace generated by the linear 
constraints. With Lagrange multipliers and differentiation one can derive the consistency 
equations in Definition [551 which uniquely determine the solution (this follows the same line 
of reasoning as after Definition but now in finite sample spaces this is elementary). ■ 

The next section will develop necessary and sufficient conditions under which /x can be 
extended to some fi and hence a minimally more informative fi. 
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7 Extension of Probabilities 



Maintaining consistency in large knowledge bases is a non-trivial problem. Its probabilistic 
cousin studied in this section is no easier: Given some probabilistic knowledge, does this cor- 
respond to a coherent set of probabilistic beliefs? 

More formally, suppose a finite set of sentences are given pre-determined probabilities. An 
interesting, and practically important, question is: what are necessary and sufficient conditions 
for the existence of a probability on sentences that gives precisely these probabilities on the 
finite set of sentences? The next result answers this question. 

Proposition 57 (extension of probabilities) Let the alphabet be countable alphabet, 
{<fi, <f n } be a finite set of sentences, and /x : {y?i, <f n } —> [0,1] a function. For each 
S C {l:n}, let 

ieS je{l:n}\S 

Then /x can be extended to a ( Gaifman) probability /j : S — > R iff the following set of equations 
for the 2 n variables as, for S C {l:n}, has a solution: 

as = 1 

SC{l:n} 

^2 as = Vo&i), fori = 1, ...,n 

SC{l:n}:ieS 

as > 0, forSC {l:n} 

a s — if ips has no (separating) model, for S C {l:n}. 

If the above conditions on a$ are met, then Proposition [56] and the remark before it imply 
that fj,Q can in particular be extended to a probability fi that is minimally more informative 
than some prior £, and [i is Gaifman if £ is. 

Proof. (=^) Suppose first that /i can be extended to a probability /j : «S — > R. We show that 
the set of equations has a solution. 

Define «5 = nijps), for each 5 C {1 : n}. Since the ^5's are pairwise disjoint, by the 
definition of a probability, Proposition [19J6, and Proposition EU2, J2sc{i- n } a s = 1- Also 
Ssc{i-n}-ies ^s 1 = = f^oifi), by Propositions EU3 and [T9l6. Since /1 is a probability, 

0^5 > for 5 C {l:n}. Finally, as = if is unsatisfiable for S C {1 : n}, by Proposition [T9l 3 . 
(In case \i is Gaifman, we use ju* of Proposition [311 to show that as = fJ,(ips) — /U*(mo3(?/>s)) = 
/i*(0) = if ips has no separating model.) 

(<=) Conversely, suppose that the equations have a solution. Let £ be a strongly Cournot 
probability on S (whose existence is given by Proposition HTT) . Put 

Sat = {S C {1 :n} \ ips is satisfiable}. 

Define \i : S — > R by 

for <f E S, where % := a s /^(ip s ) for 5 G 5W. The function \i is well-defined, since £(ips) > 0> 
if ^5 is satisfiable. We claim that /x is a probability on sentences. Clearly, /j is non-negative. 
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Suppose that ip is valid. Then 



Ufa) 

= J2sesat a s^\^s) 

= ^2 s &Sat a s [f is valid and £,(-\ips) is a probability] 

= 1. [a s = for S Sat] 

Suppose that ->{if A ip) is valid. Then 

n(<fVip) 

= Esesat w s ^((V V V) A Vs) [Equation ©] 

= EseSat^s [£{<P A Vtt) + A i/> s )] H(f A Vs) A (ip A Vs)) valid] 

Thus /i is a probability on sentences. 
Finally, /1 extends //o: 

SeSat SeSat.ieS SC{l:n}:iES 

for % = l,...,n, which completes the proof. (To proof that \l is Gaifman, simply replace 'is 
satisfiable' by 'has a separating model' in particular in Sat, and '£ strongly Cournot' by '£ 
Cournot and Gaifman' in the above proof.) ■ 



Next we study conditions on the set of sentences which guarantee that the equations of 
Proposition [57] have a solution. First, a necessary condition is introduced. 

Definition 58 (subadditive fi ) Let {ifx, ...,if n } be a finite set of sentences and fi : 
{<fi,...,<f n } — > [0,1] a function. Then /i is subadditive if, for each i,ii,...,i k e {l,...,n} 
such that the sentences ip^, ■■■■,fi k are pairwise disjoint and V 3 -=i fij ~* V 9 * valid, 

k 

fj,o(ipi.) < fJ, ((pi) and 

i=i 

k k 

fj, (ipi.) = Ho{ifi) if additionally ifi — > \f ipi. is valid. 
3=1 j'=i 

Here is another necessary condition that will be needed. 

Definition 59 (eligible fio) Let {ipi, ip n } be a finite set of sentences and /io : 
{ fi,..., ipn} —> [0,1] a function. Then /x is eligible if, for each i = l,...,n, Ho(ifi) = if 
ifi is unsatisfiable. 

Now the conditions of subadditivity and eligibility are shown to be necessary. 

Proposition 60 (subadditive and eligible /xo) Let {ipi, (p n } be a finite set of sentences 
and fiQ : {ifi, ip n } — > [0, 1] a function. Suppose that Hq can be extended to a probability on S. 
Then fi is subadditive and eligible. 
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Proof. Let fi : S — > W be a probability that extends Hq. 

Suppose that, for some i, ii, ...,ik G {1, ...,n}, the sentences (p^, ■ ■■,<Pi h are pairwise disjoint 
and Vj=i fij ~~ >* Pi i s valid. Then 



EjUM^O 



Also 



¥>i VLl VU is Valid 



implies ^(V^iV 2 ^) 
amp&es EjUM^O 
implies Y,j=i Mv^ 

Thus /i is subadditive. 

Fori e {1, ...,n}, 
Hence /zq is eligible. 



= Ufa) 



\H extends /x ] 
[Proposition [T016] 
[Proposition [T9l4] 



[V?=i V 2 ^ ->■ V» is valid] 



if ifi is unsatisfiable, since /i is a probability; and fio(<Pi) = M^j)- 



Now a further structural condition on the set of sentences is introduced that, together 
with subaddivity and eligibility, will be sufficient to guarantee that there is a solution of the 
equations. 

Definition 61 (hierarchical sentences) A finite set of sentences {(pi, ...,(p n } is hierarchical 

if, for each i ^ j , exactly one of the following holds: -<(y?j A (pj) is valid or (pi — > (pj is valid or 
Pj —7- ^ is valid. 

Intuitively, Definition |6l] states that, if p>i and <pj (i ^ j) are sentences, then either they are 
disjoint or one of them is stronger than the other. An hierarchical set of sentences is illustrated 
in Figure [U Each circle or oval indicates the set of models of a particular sentence. 

For the next result, the proof is by induction on the depth of an hierarchical set of sentences; 
we now define the concept of depth. 

Definition 62 (depth of a sentence) Let T-L be an hierarchical set of sentences. The depth 
of (p G % is defined to be the length p of the unique sequence (p%, (p p = ip of sentences in % 
such that (a) <p i+ x — > tpi is valid, for i = 1, . . . ,p — 1; (b) for each ip G H, —> ip and ip — > (pi 
are valid, for some some i, implies ip = <p i+ i or ip = (pc and (c) for each ip £ H, <Pi — > ip is 
valid implies ip — (pi. 

The depth of % is the maximum depth of its sentences. 

An empty set of sentences has depth 0. The depth of the set of sentences in Figured] is 3. 

Proposition 63 (extending hierarchical constraints) Let the alphabet be countable, 
{<px, <p n } a set of sentences, and /zo : {y^i, <Pn} — > [0,1] a subadditive eligible function. 
Suppose that {(pi, (p n } is hierarchical. Then fio can be extended to a minimally more infor- 
mative probability fi : S — > IR than some prior £ (see Definition l55\) . which is Gaifman if £ 
is. 
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Figure 1: An hierarchical set of sentences 



Proof. The proof is by induction on the depth d of the hierarchical set of sentences. 

Suppose first that d — 0, that is, the set of sentences is empty. To show that fi can be 
extended to a probability fi : S — > R, it suffices by Proposition [57J to show that the equations 
of that proposition for this case have a solution. Since the index set of the set of sentences is 
empty, its only subset is S = 0. Furthermore, ips is T. Put as = 1. Then the first equation 
from Proposition [57] is trivially satisfied. The second set of equations does not appear in this 
case. Finally, the third and fourth equations are trivially satisfied. This completes the base 
case of the induction argument. 

Now suppose the result holds for hierarchical sets of sentences having depth d. Let 
{<fi, <f n } be an hierarchical set of sentences with depth d + 1. Without loss of general- 
ity, we can assume that {(fx, (p p }, for p < n, is an hierarchical set of sentences of depth d and 
the sentences <p p +i, ■■■,f n all have depth d+ 1. By the induction hypothesis, /io restricted to 
{ipi, ipp} can be extended to a probability // : S — > R. Thus, by Proposition [57J the following 
set of equations has a solution: 

EsC{l:p} a S = 1 

J2sc{i: P y.ies a s = lM>(<Pi), for i = !> ->P 

a s > 0, for S C {1 :p} 

«5 = if ^5 is unsatisfiable, for S C 

Consider a typical sentence y2j of depth d that 'contains' sentences <p it , of depth 
Thus ipi 17 ...,(pi k are pairwise disjoint and VjLi V 9 ^ — ^ V^i is valid. (See Figure [2J) Since /x is 
subadditive, 

Ei=iA f o(^i 3 ) < ^o(Vi) and 
Y!j=i IMifaij) = fJ>o{<Pi) if also (pi -»■ V*=i y<i is valid - 

Since /i is eligible, Hq{<Pu) = if is unsatisfiable, for j = 1, fc. It has to be shown that 
when the depth d + 1 sentences are added to (p%, v? P , the corresponding set of equations has 
a solution. 
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Figure 2: Pairwise disjoint sentences (p^, ...,(pi h of depth d + 1 

To simplify the notation, assume for the moment that ip il , ...,ip ik are all the sentences of 
depth d + 1, so that the index set {1, ...,p} for the set of at-most-depth d sentences is expanded 
to {1, n} for the whole set of sentences. 

Let Si C {1, ...,p} be the set of indices of sentences in the 'path' down to cpi in the set of 
at-most-depth d sentences, so that ipSi = <pi is valid. Now consider the full set of sentences. 
Then the following are valid: 

k 

8=1 

Included in the equations for the full set of sentences are the following: 

«5,U{ii} = W)(<Pii) 

as, + ot Si u{h} H 1" a s t u{i k } = tMyfoi)- 

(The first k equations are new ones; the last equation replaces as t = ^o(fi) in the set of 
equations for the at-most-depth d sentences.) 

Furthermore, the term a§. in the first equation of the set of equations for the at-most-depth 
d sentences is replaced by + as^ih} + • • • + (%SiU{i k } in the equations for the full set of 
sentences. (This is the only change to the first equation because all the other extra subsets R 
of {1, n} that have to be considered lead to ipR that are logically equivalent to _L and hence 
have an = 0.) 

Because /x is subadditive and eligible, it is clear that 

a s > 0, for S C {l:n} 

as = if ips is unsatisfiable, for S C {l:n} 

are satisfied. 

Thus the set of equations for the full set of sentences has a solution. The case when there 
are extra sentences of depth d+1 'inside' other tpj is handled in a similar way. 

Now use Propositions [56] and [57] to conclude that /io can be extended to a minimally more 
informative probability fi : S — > R than some prior £. This completes the induction argument. 
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8 User Manual 



This section is a brief outlook on how (approximations of) the theory developed in this paper 
might be used in autonomous reasoning agents. We discuss the special case of certain knowledge 
and how it can be used to make inferences about statements that are not logical implications of 
the knowledge base. For instance, if our agent has observed a large number of ravens which are 
all black without exception, how strongly should it belief in the hypothesis that "all ravens are 
black"? We construct an agent that can learn in the limit in the usual time-series forecasting 
setting with an observation sequence indexed by natural numbers. 

Certain knowledge. A common case of knowledge is a set of sentences ipi, each having 
degree of belief 1 (that is, jUo(<^i) = 1, for i = 1, . . . , n). In other words, there is certainty that 
each ifi is valid in the intended interpretation. This corresponds to non-logical axioms in a 
theory. Let £ be a Cournot probability and suppose that fi is minimally more informative than 
£ given fM . In this case, each fi(ipg), for S C {l:n}, is uniquely determined. 

To see this, suppose that S ^ {1 : n}, say, i S, Then (i(ipg) < fi(-i(pi) = 1 — (i((pi) = 
1 — fioifi) = 0, so that nijps) = 0. Hence fi(ip{i-.n}) = 1- Thus, in this situation, by (J4]) \i 
satisfies 



for <p G S. Consequently, there is no optimisation to be done: either (pi A • • • A (p n is satisfiable 
(leading directly to the above definition for jj) or else it is not, in which case there are no 
solutions and fi cannot be defined at all. 

A further special case beyond the one just considered is when ip is a logical consequence of 
(pi A • • • A ip n . In this case, 



as one would expect. Similarly when -up is logical consequence, then //(</?) = 0. 

Note that, while it is important that the prior £ be Cournot, it is just as important that 
the posterior /i be allowed not to be Cournot. The prior should be Cournot so that the KL 
divergence is as widely defined as possible or, more intuitively, to make sure sentences having 
a separating model are not forced to have /i-probability 0. On the other hand, the probability 
\i should be allowed to be on sentences having a separating model since the evidence in the 
form of the probabilities on (pi, . . . , (p n may imply this. This is apparent, for example, for the 
case where each ipi has probability 1: according to this evidence, any sentence (even one having 
a separating model) that is disjoint from <pi A • • • A <p n must have /i-probability 0. 

Black ravens. Consider the infamous problem of the black ravens which is one of the most 
notorious problems in confirmation theory [Ear93l IRHllj . Let the ravens be identified by 
positive integers and B(i) denote the fact that raven i is black. The evidence consists of the 
sentences B(l), . . . , B{n). (Thus ipi = B(i), for i = 1, . . . , n.) Let fi : {B(l), . . . , B(n)} — > 
[0, 1] be defined by fjLo(B(i)) — 1, for i — 1, . . . , n. Thus the degree of belief that the ith raven 
is black is 1, for i = 1, . . . ,n. Suppose that £ is an uninformative prior that is Cournot and 
Gaifman. Since a-priori there are no constraints (on B), this implies that £(Vz. .£?(«)) > 0. Let 
H be a probability that is minimally more informative than £ given fi . Thus [i is given by (jH]). 



£(<p \ipt A • • • A ip n ), 



(6) 
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Now consider the sentence \/i.B{i). This is clearly not a logical consequence of the evidence, 
but one can use \i to ascribe a degree of belief that it is true and, furthermore, investigate 
what happens to this probability as the number of black ravens increases. Equation fl6]) and 
fio(B(i)) — 1, for i — 1, . . . , n, and then Theorem [271 applied to Gaifman and Cournot £ show 
that 

//(Vi.5(z)) = £(Vi.5(i) | B{1) A • • • A B{n)) n -=n? 1 

Thus, as the number of observed black ravens increases, the degree of belief that all ravens are 
black approaches 1. Of course this also implies the weaker statement that our belief in the next 
raven being black tends to one: 

£(5(n + 1) | 5(1) A • • • A B{n)) ™ 1 

Naive black ravens. Continuing the preceding example, suppose given the evidence 
5(1), . . . , B(n), each having probability 1, one wants to know the degree of belief for B(n + 1). 
Consider the tree construction in Theorem [521 for £ but with sentences ipx,ip2,--- only ranging 
over ifii = B(i) and uniform a n ^s = 2~ n . Then 

£(5(n + 1) | B(i) A • • • A B(n)) 
£(5(1) A ■ ■ ■ A B(n) A B(n + 1)) 
~ £(5(1) A ■ ■ • A B(n)) 

_ Q n+ l,{l:n+l} _ 

Thus, for this prior, knowing the evidence so far, even for large n, does not give any information 
about B{n + 1). But it gets worse: Assume £ is somehow extended to a probability on all S. 
Then for any m > n, 

£(Vz.5(i) | B(l) A • • • A B(n)) < £(5(1) A • • • A B{m) \ B{1) A • • • A B{n)) = {\) m ~ n 

hence £(Vz.5(?) | 5(1) A - • -A5(n)) = for all n, i.e. universal hypotheses can not be confirmed. 
Even more seriously, we would be absolutely sure that non-black ravens exist 

£(3i.-i5(i) | 5(1) A • • • A 5(n)) = 1 

and no number of observed black ravens n without any counter examples will ever convince 
us otherwise. These conclusions qualitatively hold even when (fi,(f2, ■■■ ranges over all or any 
subset of quantifier-free/lambda-free sentences. There seem to be no simple local rules for 
choosing a n ^s that allow confirmation of all universal hypotheses. This shows that it is crucial 
to include quantified sentences when constructing a prior and ensure it is Cournot (even when 
only making inferences about unquantified sentences like B(n + 1)). 

Corollary 64 (learning in the limit) Let i = Nat, if be a closed term of type Nat — >■ o, fi 
be a Gaifman probability on sentences, and /j,(\/x.((p x)) > 0. Then 

lim fx(Vx.((p x) | (ip 0) A • • • A (ip n)) = 1 

n—^oo 
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This generalizes the black raven example and follows from Theorem [571 In particular, 
learning in the limit is possible for the Gaifman and Cournot probability constructed in the 
proof of Theorem HOI provided Mx.{(p x) has a separating model. 

The proof crucially exploits that 0,1,2,... are representatives of all terms of type Nat. 
As discussed in Example HHJ this would no longer be true had we introduced a description 
operator into our logic. Corollary [64] would break down and universal hypotheses over the 
natural numbers could not be inductively confirmed, not even asymptotically. 

Approximations. The construction of Cournot and Gaifman /x in the proof of Theorem HQ] 
required to determine particular separating models for Xi an d to determine whether they are 
also models of other sentences (p. This has been eased by Corollary [53] which only requires 
determining wether sentences ip n> s have (no) separating model. Still this is non-decidable. 

Assume we had some calculus to determining whether sentences have (no) separating model. 
Even an asymptotic or approximate or incomplete calculus may be of use. Fix a sequence on- 
the-fly of all sentences (f2,(p3,... satisfying Theorem [52] 4 (once and for all). Determine the 
subsequence of all sentences Xi — fju X2 = (Pj 2 i ■■■ with separating models (on the fly). 

In order to determine /i to accuracy e > for some finite number of sentences {p^, (pi n } 
of interest, we have to perform the tree construction "only" for ipi G {xi, ■■■,Xm}, where 
X^m+i < e an d U P to depth d = max{z 1; i n }, i.e. determine finitely many cases. If a 
new sentence <fi n+1 of interest "arrives" or higher precision is needed, d respectively m can be 
increased appropriately (that's what was meant with on-the-fly). It is important to expand the 
already existing trees with assigned probabilities, rather than restarting the procedure with a 
larger d, since this can lead to wrong inductive limits if different choices are made every time. 

Work flow example for a simple inductive reasoning agent. Below we present an 
example of a fictitious inductive reasoning agent. It is fictitious, since many operations are 
incomputable. In practice one needs to employ approximations at various steps. How to do 
this is an open problem. 

1. Assume the agent has been endowed with some background knowledge e.g. about ki- 
netics, colors, biology, birds, etc. Its knowledge is represented in the form of a hierarchical 
(Definition [6T]) set of sentences {<pi, . . . , <p n } that hold for sure (/i (v 9 i) — 1 f° r some i) or with 
some probability < fio(Pi) < 1 for the other %. Our expressive higher-order logic provides a 
convenient way of doing so [LNTT]. 

2. Assume is subadditive and eligible (Definitions [581 and 159]) . This may not be so easy 
to achieve, and is akin to the general problem of maintaining consistent knowledge bases. 

3. Next, use an approximation of a Gaifman and Cournot £ prior, e.g. as defined in the 
proof of Theorems HD] or Theorem [5U] or Corollary [53] or and approximation thereof as outlined 
above. The agent now constructs via Definition [55] the minimally more informative probability 
/i, which exists by Proposition [63] and is Gaifman by Proposition [57] and the remark after 
Equation (jlj). 

4. Let o ,oi,o 2 , ■■■ be the agent's life-time sequence of past and future observations of all 
kinds of objects, ravens and otherwise, all it has/will ever observe, e.g. o n is what the agent 
sees n seconds after it has been switched on. 

5. Assume current time is n, and the agent needs to hypothesize about the world to decide 
its next action, e.g. whether some observed regularity is "real". For instance, "if observation at 
time k is a raven, is it also black?" . We can formalize this with a predicate (p of type Nat — > o 
with the intended interpretation of (ip k) as "if observation at time k is a raven, it is black" . 
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6. Of course the answer to (p 0), (ip n) is immediate, since o , ■■■,o n have already been 
observed. If they are all true, the agent may start to wonder whether "all ravens are black", 
or formally, whether \/x.(<p x) is true. Note that non-raven observations in the sequence are 
allowed. 

7. If the agent is equipped with our inductive reasoning system, its degree of belief in this 
hypothesis is n(Wx.(p x)\((p 0) A • • • A (p n)). 

8. This result can be the basis for some decision process maximizing some utilities resulting 
in an informed action. 

Is the degree of belief derived in Step 7 and used in Step 8 reasonable? At least asymp- 
totically Corollary [6H ensures that in the limit the agent's belief tends to 1, which is very 
reasonable. So our system of inductive reasoning at least passes this test. Most other inductive 
reasoning systems have difficulties in getting this right |RH11] . 

The Monty Hall Problem. The Monty Hall problem is based on a US game show. A 
contestant is presented with three doors. Behind one of the doors is a prize. The other two 
doors have nothing behind them. The contestant is asked to select a door. After the contestant 
selects a door, but before that door is opened, the game host selects and opens one of the other 
two doors. At this point the contestant is again asked to select their preferred door and will 
win whatever is behind this final selection. 

It is expected that the host will not reveal the prize. This constraint means that the host 
will always open a door to reveal nothing behind it. This limits the contestant's second choice 
to either persisting with the door selected originally, or switching to the remaining door. It is 
a known, if counterintuitive, result that the best strategy for the contestant is to switch doors. 

Let i = Door. We introduce the constants Di, D 2 , D 3 : Door and 

play erFirstS election, ho >stS -election, prizeDoor : Door — > o 
unique : (Door — >• 6) — Y o. 

As we shall see, the function unique is used to capture the constraint on the preceding three 
predicates that exactly one door makes each of them true. With those, we can now define a set 
of sentences: 

<pi := (unique = Xp3d.((p d) A Wx.((p x) — > x = d))) A 

(unique play erFirstS election) A (unique ho stS election) A (unique prizeDoor) 
p>2 '■= (prizeDoor d\) 
p 3 := (prizeDoor d 2 ) 
p^ := (playerFirstS election d\) 
ips := (playerFirstSelection d 2 ) 

Pq := Vd. ((host Selection d) — > (^(playerFirstSelection d) A -^(prizeDoor d))) 
pi := (hostS election d\) 
Ps := (hostS election d 2 ) 

p 9 := 3d. ((playerFirstSelection d) A (prizeDoor d)) 

Selection of the correct prior is very important for this problem. We require that the prior 
be symmetric in which door makes the prizeDoor predicate true, and which door makes the 
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play erFirstS election predicate true. We also require that there be no correlation between the 
doors that make these predicates true in the prior. 

We now perform the tree construction from Proposition [51] using the set of sentences above. 
We leave out any branches that will have prior probability 0. Because of the requirements that 
the prior be symmetric and uncorrelated, each of the leaf nodes of the following tree will have 
equal probability. Note that these requirements mean that £(^2) = 1/3 rather than 0.5 as 
usual. 




Predicates unique 
Prize location 



Player selection 



Assume, without loss of generality, that the prize is located behind door 1. This allows us to 
zoom in onto the right-most sub-tree rooted at ->y? 3 and add the Host door selection predicates. 
These predicates are the host constraints, <ps, which we will require to be true with probability 
1, and then predicates that elicit the host's selection. As not all host selections are legal, some 
branches here have probability (shown dashed). 




Player selection 

Host constraints 
Host selection 



Each of the three major branches with non-zero probability has equal prior probability. Of 
these the left-hand two {—up a) each have the host forced to open one particular door, and hence 
the remaining door has the prize - the player is better off swapping. Only on the left hand 
branch when the player correctly guessed the prize initially is the player better off not swapping, 
but this is a less likely outcome than the other. Hence the player is better off swapping. 

9 Discussion 

A key goal of this research is that of integrating logic and probability, a problem that has a 
history going back around 300 years and for which three main threads can be discerned. The 
oldest by far is the philosophical/mathematical thread that can be traced via Boole |Boo54t 
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IBoo52j back to Jacob Bernoulli in 1713. An extensive historical account of this thread can 
be found in |Hai96] : the idea of putting probabilities on sentences goes back to before [Los55j 
which contains references to even earlier material; the important Gaifman condition appeared 
in |Gai64] and was further developed in |GS82j ; in |SK66] the theory is developed for infmitary 
logic; overviews of more recent work from a philosophical perspective can be found in [HajOl 



IWil02[ IWil08b] . The second thread is that of the knowledge representation and reasoning 
community in artificial intelligence, of which |Nil86| IHal90| IFH94} IHal03| ISA07] are typical 
works. The third thread is that of the machine learning community in artificial intelligence, 
of which |Mug96j IDK031 lMMR+051 [RDOBl IMR071 ldSB07l lKD07l lPfe07l IGMR+08] are typical 
works. 

An important and useful technical distinction that can be made between these various 
approaches is that the combination of logic and probability can be done externally or internally 
Wil08bJ: in the external view, probabilities are attached to sentences in some logic; in the 
internal view, sentences incorporate statements about probability. One can even mix the two 
cases so that probabilities appear both internally and externally. We now examine each of these 
in turn. 



Probabilities inside sentences. In the internal view, the uncertainty is modeled inside 
the sentences of a theory. For this to be possible, we must make a careful choice of logic; in 
particular, first-order logic (alone) is not expressive enough for this purpose. There has been a 
tradition of extending first-order logic with probabilistic extensions |Hal03t HajOl IWil02] . A 



good alternative approach, studied in |NL09l [NLU08] . is to simply adopt higher-order logic. 
The most crucial property of higher-order logic that we exploit is that it admits so-called higher- 
order functions which take functions as arguments and/or return functions as results. It is this 
property that allows the modelling of, and reasoning about, probabilistic concepts directly in 
higher-order theories. 

Probabilities outside sentences. In contrast to the internal view, almost all other ap- 
proaches to integrating logic and probability model uncertainty by putting probabilities outside 
sentences. This natural idea has been taken up by many researchers and has a large body of 
theoretical support. Here we follow the lead of Gaifman and Snir for first-order logic in [G~S82j 
(that builds on earlier work in |Gai64j ) . They showed that, under certain conditions, there is 
a probability on sentences that is strictly positive on consistent sentences (that is, those that 
have a model). This is an important property of any probability that is intended to be used as 
a prior in Bayesian inference. An accessible account of this material can be found in [Par94] . 

Other such systems, and there are now many of these, include Bayesian logic programs 
[KD07], Markov logic networks |RD06] , and stochastic logic programs |Mug96 . While the 



intention is usually that the probabilities define (or at least constrain) a distribution on the set 
of interpretations, some systems take other approaches. For example, the probabilities can be 
used to define a distribution on proofs or a distribution on programs. For a taxonomy of such 
systems, see |MR07] . 

Conclusion. This paper provides much of the foundation for the design of an integrated 
probabilistic reasoning system that can handle probabilities both inside and outside sentences. 
The main challenge for the future lies in the discovery of reasonable approximation schemes for 
the different currently incomputable aspects of the general theory. 
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A List of Notation 



x,y,z variables 

t, r, s terms 

a, (3 type of a term 

type of the booleans 

1 type of individuals 
T Truth 

_L Falsity 

ip, x, ip formula = term of type o, called sentence if closed 

S set of all sentences 

X set of interpretations 

X set of separating interpretations 

/ interpretation 

mod(ip) {I e T\ip is valid in /} = set of models of <p 

mod(tp) {I e T\ip is valid in 1} = set of separating models of ip 

B Borel a- algebra generated by {mod((p)\(p G S}, if alphabet countable 

B Borel a- algebra generated by {mod,(tp)\ip e <S}, if alphabet countable 

/i, (p) (estimated) probability on sentences 

li*, (Ji*) probability on sets of (separating) interpretations 

k, n natural numbers used for indexing 

tpi, tp2, ... enumeration of some or all sentences 

S C {l:n} = {l,...,n} = index of "positive" in ... 

ips = *Pn,s {Aies Pi) A (Aje{i:n}\s ""Vj) = hierarchical basis 

a ni 5 fi>(i>n,s) — base probabilities 

^ prior probability (usually Gaifman and Cournot) 

Ho(<Pi) : {tfi, (Pn} -> [0, 1] = constraints on //: /i(v?i) = /io(Vi) 



B List of Definitions, Theorems, Examples, ... 



Definition 1 type a 5 

theorem. 1 

Definition 2 term t 5 

theorem. 2 theorem.3 theorem.4 

Definition 5 frame {T> a } a 6 

theorem. 5 

Definition 6 valuation V 6 

theorem. 6 

Definition 7 variable assignment v 6 

theorem. 7 

Definition 8 interpretation {{V a } a , V) 6 
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theorem. 8 

Definition 9 satisfiable 7 

theorem. 9 

Definition 10 consistency 7 

theorem. 10 

Definition 11 logical consequence 7 

theorem. 1 1 

Definition 12 separating interpretation/model 7 

theorem. 12 

Definition 13 extensionally complete 7 

theorem. 13 

Proposition 14 extensionally complete separating 8 

theorem. 14 

Proposition 15 existence of separating models 8 

theorem. 15 

Theorem 16 compactness 8 

theorem. 16 

Definition 17 probability on sentences 9 

theorem. 17 

Definition 18 pairwise disjoint sentences 9 

theorem. 18 

Proposition 19 properties of probability on sentences 9 

theorem. 19 

Definition 20 Gaifman probability 10 

theorem. 20 

Proposition 21 Gaifman probability 11 

theorem. 21 

Proposition 22 limits for countable alphabet 12 

theorem. 2 2 

Proposition 23 Gaifman for countable alphabet 13 

theorem. 2 3 

Example 24 natural numbers Nat 14 

theorem. 24 

Definition 25 strongly Cournot probability 14 

theorem. 2 5 

Definition 26 Cournot probability 14 

theorem. 2 6 

Theorem 27 confirming universal hypotheses 14 

theorem. 27 

Definition 28 probability on interpretations 16 

theorem. 2 8 

Proposition 29 16 

theorem. 2 9 

Proposition 30 finite countable additivity 17 

theorem. 30 

Proposition 31 /i =>• /i* 17 

theorem. 31 
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Proposition 32 separating /i* =>- /i Gaifman 18 

theorem. 32 

Proposition 33 Gaifman ji =>- ji* separating 19 

theorem. 33 

Corollary 34 fi*(X \ 1) = & [i Gaifman 20 

theorem. 34 

Definition 35 strongly Cournot fi* 21 

theorem. 35 

Proposition 36 strongly Cournot fi* <^ ji 21 

theorem. 36 

Definition 37 Cournot /i* 21 

theorem. 37 

Proposition 38 Cournot /i* <=^ fi 21 

theorem. 38 

Definition 39 discrete fi* 22 

theorem. 39 

Theorem 40 Cournot and Gaifman probability 22 

theorem. 40 

Proposition 41 strongly Cournot probability 22 

theorem.41 

Example 42 a probability which is not Gaifman 23 

theorem. 42 

Example 43 a probability which is strongly Cournot but not Gaifman 23 

theorem. 43 

Example 44 a probability which is Gaifman but not Cournot 23 

theorem.44 

Example 45 a probability which is Cournot but not strongly Cournot 23 

theorem.45 

Example 46 standard interpretation of Nat 24 

theorem. 46 

Example 47 non-standard interpretation of Nat 24 

theorem. 47 

Example 48 the description operator i 24 

theorem.48 

Definition 49 rigid mixture representation 25 

theorem. 49 

Theorem 50 probability characterization - Gaifman and Cournot 25 

theorem. 50 

Proposition 51 -^sV^-tree 26 

theorem. 51 

Theorem 52 tree characterization of general/Cournot /Gaifman probabilities 27 

theorem. 52 

Corollary 53 Gaifman and Cournot probability 30 

theorem. 53 

Definition 54 relative entropy on sentences 31 

theorem. 54 

Definition 55 minimally more informative probability 34 
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theorem. 5 5 

Proposition 56 minimally more informative probability 35 

theorem. 56 

Proposition 57 extension of probabilities 36 

theorem. 5 7 

Definition 58 subadditive //o 37 

theorem. 5 8 

Definition 59 eligible /i 37 

theorem. 59 

Proposition 60 subadditive and eligible /i 37 

theorem. 60 

Definition 61 hierarchical sentences 38 

theorem. 61 

Definition 62 depth of a sentence 38 

theorem. 62 

Proposition 63 extending hierarchical constraints 38 

theorem. 63 

Corollary 64 learning in the limit 42 

theorem. 64 
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